python - Browser rendered URL and Scraped URL are different. Please explain -

- July 15, 2014

i new world of web scraping,python , scrapy. pardon me if there fundamental flaw in understanding. come java/r background. trying scrape www.amazon.in book details. built xpaths required after using chrome's xpath finder, when try same xpath query in scrapy shell different form of url being displayed.

for example following xpath query //ul[@id='ref_976390031']/li[23]/a[@href]/@href in xpath finder get

www.amazon.in/s/ref=lp_976389031_nr_n_21?fst=as%3aoff&rh=n%3a976389031%2cn%3a%21976390031%2cn%3a1318203031&bbn=976390031&ie=utf8&qid=1418660681&rnid=976390031

but when try on response variable of scrapy shell response.xpath("//ul[@id='ref_976390031']/li[23]/a[@href]/@href").extract()

i get

http://www.amazon.in/b?ie=utf8&node=1318203031

what's more interesting is, scrapped link when keyed browser lands in different page opposed page supposed land( same behaviour i.e. landing in different pages occurs when scrapped too)

one more thing have observed, while scrapping though links scrapped different browser rendered links of them directed/redirected properly, while links dont.

this behaviour makes scrapper scrape on links , links not scrapped @ all.

any help/explanation behaviour appreciated. in advance.

kyle k,warvariuc right, site rendering different urls different user agents.

adding following parameter in settings.pyfixed issue

user_agent = "mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.1 (khtml, gecko) chrome/22.0.1207.1 safari/537.1"

thank taking time reply.

Search This Blog

Deter

python - Browser rendered URL and Scraped URL are different. Please explain -

Comments

Post a Comment

Popular posts from this blog

java - Unable to make sub reports with Jasper -

Save and close a word document by giving a name in R -

How can I utilize Yahoo Weather API in android -