python - XPath for sub-element's text value in lxml -


first of all, possible such thing?

i have been trying out generate xpath expression using "sub-element text values" present in webpage. trying using lxml (etree, html, getpath), elementtree modules in python. don't know how generate xpath expression value present in webpage. totally know scrapy framework in python, different.

below incomplete code..

import urllib2, re lxml import etree  def wgeturl(target):     try:         req = urllib2.request(target)         req.add_header('user-agent', 'mozilla/5.0 (windows; u; windows nt 5.1; en-gb; rv:1.9.0.3 gecko/2008092417 firefox/3.0.3')         response = urllib2.urlopen(req)         outtxt = response.read()         response.close()     except:         return ''     return outtxt   newurl = 'http://www.iupui.edu/~webtrain/tutorials/tables.html' # homepage  dt = wgeturl(newurl) parser = etree.htmlparser() tree   = etree.fromstring(dt, parser) 

as per lxml documentation creating element tree manually, how can use read , parsed html data (in example variable tree or data) access sub-element. or more importantly, if possible sub-element text value.

let's in above example webpage, want search table "supplies , expenses" , generate xpath expression dynamically value - supplies , expenses

is there option !!! ultimate goal, achieve read webpage , generate xpath sub-element text value present in webpage.

to find elements based on part of text value:

"//*[contains(text(), 'some_value')]" 

for example, if have this:

<div id="somediv">     <span>something here</span>     <a href="#">click here</a> </div> 

you can find sub-elements containing word "here" this:

"//div[@id='somediv']//*[contains(text(), 'here')]" 

or can example find sub-div span elements containing word "something":

"//div[@id='somediv']//span[contains(text(), 'something')]" 

as parsing in lxml:

from lxml import etree outtxt = response.read() root = etree.fromstring(outtxt) root.xpath("my_xpath_expression") 

update:

to full xpath expression element, use elementtree.getpath() method, so:

tree = etree.elementtree(root) # print xpath of # elements in 'root' e in root.iter():     print tree.getpath(e) 

Comments

Popular posts from this blog

java - Plugin org.apache.maven.plugins:maven-install-plugin:2.4 or one of its dependencies could not be resolved -

Round ImageView Android -

How can I utilize Yahoo Weather API in android -