python - Browser rendered URL and Scraped URL are different. Please explain -


i new world of web scraping,python , scrapy. pardon me if there fundamental flaw in understanding. come java/r background. trying scrape www.amazon.in book details. built xpaths required after using chrome's xpath finder, when try same xpath query in scrapy shell different form of url being displayed.

for example following xpath query //ul[@id='ref_976390031']/li[23]/a[@href]/@href in xpath finder get

www.amazon.in/s/ref=lp_976389031_nr_n_21?fst=as%3aoff&rh=n%3a976389031%2cn%3a%21976390031%2cn%3a1318203031&bbn=976390031&ie=utf8&qid=1418660681&rnid=976390031

but when try on response variable of scrapy shell response.xpath("//ul[@id='ref_976390031']/li[23]/a[@href]/@href").extract()

i get

http://www.amazon.in/b?ie=utf8&node=1318203031 

what's more interesting is, scrapped link when keyed browser lands in different page opposed page supposed land( same behaviour i.e. landing in different pages occurs when scrapped too)

one more thing have observed, while scrapping though links scrapped different browser rendered links of them directed/redirected properly, while links dont.

this behaviour makes scrapper scrape on links , links not scrapped @ all.

any help/explanation behaviour appreciated. in advance.

kyle k,warvariuc right, site rendering different urls different user agents.

adding following parameter in settings.pyfixed issue

user_agent = "mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.1 (khtml, gecko) chrome/22.0.1207.1 safari/537.1" 

thank taking time reply.


Comments

Popular posts from this blog

java - Plugin org.apache.maven.plugins:maven-install-plugin:2.4 or one of its dependencies could not be resolved -

Round ImageView Android -

How can I utilize Yahoo Weather API in android -