python 如何去掉正文末尾的http链接

 我来答

3个回答

#热议# 空调使用不当可能引发哪些疾病？

催命鬼儿xp
2022-07-12 · TA获得超过325个赞

知道答主

回答量：147

采纳率：100%

帮助的人：4.8万

我也去答题访问个人页

关注

展开全部

import os,re
def check_flag(flag):
regex = re.compile(r'\.*img\/',re.M)
result = True if regex.search(flag) else False
return result

#soup = BeautifulSoup(open('index.html'))
from bs4 import BeautifulSoup

file = open('index.html', 'r', encoding='utf-8')
#file = open(r'index.html','r',encoding="UTF-8")
soup = BeautifulSoup(file, 'html.parser')
for element in soup.find_all('img'):
if 'src' in element.attrs:
print(element.attrs['src'])
if check_flag(element.attrs['src']):
#if element.attrs['src'].find("img"):
element.attrs['src'] = "/go/${basefact9uu99.currentMediaVersion}/css/QuansuCss/AE/2022/dxbpek2022/EN" + element.attrs['src']

print("##################################")
with open('indexmichenT8.html', 'w',encoding="UTF-8") as fp:
fp.write(soup.prettify()) # prettify()的作⽤是将sp美化⼀下，有可读性
file.close()

已赞过 已踩过<

评论收起

安贞高峰
2018-04-13 · TA获得超过3068个赞

知道小有建树答主

回答量：2680

采纳率：75%

帮助的人：184万

我也去答题访问个人页

关注

展开全部

"试验以下方法：
1）空格怎么替换掉
2）排版缩进怎么处理
3）各种标签需要做特殊处理，比如<h1>  <p>
4）表格排版
5）css处理
当然，也可以仅仅简单的用下面的正则表达式（这样会留有一部分问题没有处理）：
html=re.sub(""(?isu)<[^>]+>"","" "",html)
这样就可以将标签去掉。效但效果肯定是不理想的。
注：在其过程中只需要引入import re模版即可。"


本回答被网友采纳






已赞过已踩过<

你对这个回答的评价是？
评论收起

丶不如不问
2018-04-12 · TA获得超过3474个赞

知道小有建树答主

回答量：765

采纳率：0%

帮助的人：425万

我也去答题访问个人页

关注

展开全部

调用python内置的re模块，用正则表达式匹配汉字即可

已赞过 已踩过<

评论收起

更多回答（1）

推荐律师服务：若未解决您的问题，请您详细描述您的问题，通过百度律临进行免费专业咨询

python 如何去掉正文末尾的http链接

其他类似问题

为你推荐：