python - Browser rendered URL and Scraped URL are different. Please explain -
i new world of web scraping,python , scrapy. pardon me if there fundamental flaw in understanding. come java/r background. trying scrape www.amazon.in book details. built xpaths required after using chrome's xpath finder, when try same xpath query in scrapy shell different form of url being displayed.
for example following xpath query //ul[@id='ref_976390031']/li[23]/a[@href]/@href
in xpath finder get
www.amazon.in/s/ref=lp_976389031_nr_n_21?fst=as%3aoff&rh=n%3a976389031%2cn%3a%21976390031%2cn%3a1318203031&bbn=976390031&ie=utf8&qid=1418660681&rnid=976390031
but when try on response variable of scrapy shell response.xpath("//ul[@id='ref_976390031']/li[23]/a[@href]/@href").extract()
i get
http://www.amazon.in/b?ie=utf8&node=1318203031
what's more interesting is, scrapped link when keyed browser lands in different page opposed page supposed land( same behaviour i.e. landing in different pages occurs when scrapped too)
one more thing have observed, while scrapping though links scrapped different browser rendered links of them directed/redirected properly, while links dont.
this behaviour makes scrapper scrape on links , links not scrapped @ all.
any help/explanation behaviour appreciated. in advance.
kyle k,warvariuc right, site rendering different urls different user agents.
adding following parameter in settings.py
fixed issue
user_agent = "mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.1 (khtml, gecko) chrome/22.0.1207.1 safari/537.1"
thank taking time reply.
Comments
Post a Comment