scrapy 提取不在标签内的文字 50

我想抓取的内容为“”黄念祖：亚运会劫机事件中只有三个人活着，出事时都在念佛。”可是我用了response.xpath('//[@id="result"]/div[3]/d... 我想抓取的内容为
“”黄念祖：亚运会劫机事件中只有三个人活着，出事时都在念佛。”
可是我用了 response.xpath('//[@id="result"]/div[3]/div[1]/h2/a/text()').extract()只能抓到
”黄念祖：亚运会，事件中只有三个人活着，出事时都在念佛“
劫机没了，还多了个不想要的逗号，求问如何改进
http://search.sina.com.cn/?by=title&q=%BD%D9%BB%FA&c=blog&range=article&col=&source=&from=&country=&size=&time=&a=&sort=time
这是原网址展开

 我来答

1个回答

#合辑# 面试问优缺点怎么回答最加分？

匿名用户
2016-05-07

展开全部

代码如下

def parse(self,response):
    states = {}
    list1 = []
    list2 = []

    for row in response.xpath("//*[@id='info']/*"):
        if row.xpath("span[@class='pl']/text()"): 
            title = row.xpath("span[@class='pl']/text()").extract()[0].strip()
            text = row.xpath("a/text()").extract()[0].strip()
            states[title]=text
        elif row.xpath("text()"):
            list1.append(row.xpath("text()").extract()[0].strip()[:-1])   
 
    for row in response.xpath("//*[@id='info']/text()").extract(): 
        if row.strip():
            list2.append(row.strip())

    for i in range(len(list1)):
        states[list1[i]]=list2[i]
      
    for n in states:
        print n,states[n]


本回答被网友采纳






已赞过已踩过<

你对这个回答的评价是？
评论收起

推荐律师服务：若未解决您的问题，请您详细描述您的问题，通过百度律临进行免费专业咨询

scrapy 提取不在标签内的文字 50

其他类似问题

为你推荐：