如何获得使用BeautifulSoup只是两个指定标签之间的所有文本
1个回答
展开全部
因为你的html不是合法的xml格式,标签没有成对出现,只能用html解析器
1
2
3
4
5
6
7
8
from bs4 import BeautifulSoup
s = """
</span><span style= 'font-size:12.0pt;color:#CC3399'>714659079qqcom 2014/09/10 10:14</span></p></div>
"""
soup = BeautifulSoup(s, "html.parser")
print soup
print soup.get_text()
如果你想用正则的话,只要把标签匹配掉就可以了
1
2
3
4
5
6
7
8
import re
s = """
</span><span style= 'font-size:12.0pt;color:#CC3399'>714659079qqcom 2014/09/10 10:14</span></p></div>
"""
dr = re.compile(r'<[^>]+>', re.S)
dd = dr.sub('', s)
print dd
1
2
3
4
5
6
7
8
from bs4 import BeautifulSoup
s = """
</span><span style= 'font-size:12.0pt;color:#CC3399'>714659079qqcom 2014/09/10 10:14</span></p></div>
"""
soup = BeautifulSoup(s, "html.parser")
print soup
print soup.get_text()
如果你想用正则的话,只要把标签匹配掉就可以了
1
2
3
4
5
6
7
8
import re
s = """
</span><span style= 'font-size:12.0pt;color:#CC3399'>714659079qqcom 2014/09/10 10:14</span></p></div>
"""
dr = re.compile(r'<[^>]+>', re.S)
dd = dr.sub('', s)
print dd
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询