C# 正则表达式的写法,过滤HTML的

源码文件是:<!DOCTYPEhtmlPUBLIC"-//W3C//DTDXHTML1.0Transitional//EN""http://www.w3.org/TR/x... 源码文件是:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
....
中间有很多内容
....
<link rel="stylesheet" href="/js/tipbox/msgbox.css" />
<font class="col_356cea"> 直到第二个出现这串字符的
.....
下面还有东西!
我现在想把第一次出现<font class="col_356cea">这个字串之前的所有字符,包括<font class="col_356cea">本身都删除!这个正则应该如何写呀!刚学这个!怎么写都不成功 !
我用的是C# ! 展开

 我来答

3个回答

#热议# 普通体检能查出癌症吗？

维湾教育培训
推荐于2016-05-28 · 知道合伙人软件行家

维湾教育培训
知道合伙人软件行家

采纳数：3549 获赞数：13648

无

向TA提问私信TA

关注

展开全部

代码虽短功能却超强，运行效率也很高！
public static string ClearHtmlCode(string text)
{
text = text.Trim();
if (string.IsNullOrEmpty(text))
return string.Empty;
text = Regex.Replace(text, "[/s]{2,}", " "); //two or more spaces
text = Regex.Replace(text, "(<[b|B][r|R]/*>)+|(<[p|P](.|/n)*?>)", " "); //<br>
text = Regex.Replace(text, "(/s*&[n|N][b|B][s|S][p|P];/s*)+", " "); //
text = Regex.Replace(text, "<(.|/n)*?>", string.Empty); //any other tags
text = Regex.Replace(text, "/<//?[^>]*>/g", string.Empty); //any other tags
text = Regex.Replace(text, "/[ | ]* /g", string.Empty); //any other tags
text = text.Replace("'", "''");
text = Regex.Replace(text, "/ [/s| | ]* /g", string.Empty);
return text;
}

已赞过 已踩过<

评论收起

cipherf
2014-11-16 · TA获得超过4.5万个赞

知道大有可为答主

回答量：2.3万

采纳率：67%

帮助的人：1.5亿

我也去答题访问个人页

关注

展开全部

text = Regex.Replace(text, @"<[^>]*?>", "");

其实是远不够的....

本回答被网友采纳

已赞过 已踩过<

评论收起

球员故事汇
2014-11-15

知道答主

回答量：92

采纳率：100%

帮助的人：18.2万

我也去答题访问个人页

关注

展开全部

那你可以试下substring()和indexof()方法剪字符

已赞过 已踩过<

评论收起

更多回答（1）

推荐律师服务：若未解决您的问题，请您详细描述您的问题，通过百度律临进行免费专业咨询

C# 正则表达式的写法,过滤HTML的

其他类似问题

为你推荐：