xpath取一个节点下的所有文字
<tdclass="index"><div><aheaf="baidu.com">123<b>456</b>789</a><ahear="qq.com"/a><div><...
<td class="index">
<div>
<a heaf="baidu.com">123<b>456</b>789</a>
<a hear="qq.com" /a>
<div>
</td>
如上代码,我想求出“123456789“ 这个字符串
我现在的代码是:
i.xpath(".//td[@class='index']/div/a[1]/text()").extract() =['123','789']
如何才能=['123456789'] 展开
<div>
<a heaf="baidu.com">123<b>456</b>789</a>
<a hear="qq.com" /a>
<div>
</td>
如上代码,我想求出“123456789“ 这个字符串
我现在的代码是:
i.xpath(".//td[@class='index']/div/a[1]/text()").extract() =['123','789']
如何才能=['123456789'] 展开
2个回答
展开全部
我要取出mrlevo520的内容,怎么取呢,很多方法,bs4也可以,正则也可以,动态selenium也可以,这次我想尝试用xpath来做,一则是为了和selenium接轨,xpath的确很强大,二来是firefox提供firebug插件,可以直接定位你需要内容的标签,一步到位简直完美,不多说,上程序。
import urllib2
from lxml import etree
crawl_url = "http://www.jianshu.com/p/e2c4ebd2eeb3"
req = urllib2.Request(crawl_url)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36')
response = urllib2.urlopen(req)
html = response.read()
selector = etree.HTML(html)
# 核心部分
bloger = selector.xpath("//a[@class='author-name blue-link']")
info = bloger[0].xp
import urllib2
from lxml import etree
crawl_url = "http://www.jianshu.com/p/e2c4ebd2eeb3"
req = urllib2.Request(crawl_url)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36')
response = urllib2.urlopen(req)
html = response.read()
selector = etree.HTML(html)
# 核心部分
bloger = selector.xpath("//a[@class='author-name blue-link']")
info = bloger[0].xp
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询