在Python中如何用正则表达式提取xml中的之间的内容

WhenEScellsdifferentiate,theymigrateoutfromcoloniesongelatin-coateddishes,similart... When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,
,<xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. 
(A) R1 cells were cultured for 5 days in the presence of
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic>
<xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). 
（注意上面的到之间有换行）如何用正则表达式，最后得到一个列表，里面的内容为每个到之间的内容，即内容为list=['When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the <xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. ','(A) R1 cells were cultured for 5 days in the presence of <xref ref-type="bibr"rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic> <xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). 
'] 展开

 我来答

3个回答

#热议# 在购买新能源车时，要注意哪些？

空空的差别
2018-05-10

知道答主

回答量：12

采纳率：0%

帮助的人：8.7万

我也去答题访问个人页

关注

展开全部

# 代码
html_text = '''
<p>When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the 
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,
,<xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. </p>
<p>(A) R1 cells were cultured for 5 days in the presence of 
<xref ref-type="bibr" rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic> 
<xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). </p>
'''

pattern = r'(<p>.*?</p>)'
html_text = re.sub('\n', '', html_text)
text = re.findall(pattern, html_text)
print(text)

# 输出
['<p>When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the <xref ref-type="bibr" rid="pone.0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,,<xref ref-type="bibr" rid="pone.0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells. </p>',
 '<p>(A) R1 cells were cultured for 5 days in the presence of <xref ref-type="bibr" rid="pone.0000015-Rogers1">[1]</xref> and <italic>nanog</italic> <xref ref-type="bibr" rid="pone.0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone.0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml). </p>']

已赞过 已踩过<

评论收起

俏丽又透彻的多宝鱼q
2015-08-12 · 超过32用户采纳过TA的回答

知道答主

回答量：56

采纳率：0%

帮助的人：44.1万

我也去答题访问个人页

关注

展开全部

建议用python BeautifulSoup直接对xml进行解析吧，都不要正则匹配！

本回答被提问者和网友采纳

已赞过已踩过<

你对这个回答的评价是？
评论收起

听雨婷2Y
2018-05-10 · TA获得超过360个赞

知道小有建树答主

回答量：227

采纳率：100%

帮助的人：143万

我也去答题访问个人页

关注

展开全部

直接用python的库读XML不是更方便

已赞过 已踩过<

评论收起

1条折叠回答

推荐律师服务：若未解决您的问题，请您详细描述您的问题，通过百度律临进行免费专业咨询

在Python中如何用正则表达式提取xml中的<p>之间的内容

在Python中如何用正则表达式提取xml中的<p>之间的内容

为你推荐：

在Python中如何用正则表达式提取xml中的<p>之间的内容

其他类似问题

为你推荐：