JAVA解析html

哥哥姐姐们谁能告诉我解析HTML,把里面的东西读出来，然后保存到数据库中，最好详细点哥哥姐姐们谁能告诉我解析HTML,最好详细点... 哥哥姐姐们谁能告诉我解析HTML,把里面的东西读出来，然后保存到数据库中，最好详细点哥哥姐姐们谁能告诉我解析HTML,最好详细点展开

 我来答

1个回答

#热议# 海关有哪些禁运商品？查到后怎么办？

alovelyella
推荐于2016-07-02 · TA获得超过188个赞

知道答主

回答量：194

采纳率：0%

帮助的人：120万

我也去答题访问个人页

关注

展开全部

吧源文件找到，去掉html的符号就可以啦。给你看一段我写的，写的不好，还得改呢(*^__^*) 嘻嘻……：
 public String HtmlToTextGb2312(String inputString) 
    { 
              String htmlStr = inputString; //含html标签的字符串 
              String textStr =""; 
              Pattern p_script; 
              Matcher m_script; 
              Pattern p_style; 
              Matcher m_style; 
              Pattern p_html; 
              Matcher m_html;
              Pattern p_houhtml; 
              Matcher m_houhtml;
              Pattern p_spe; 
              Matcher m_spe;
              Pattern p_blank; 
              Matcher m_blank;
              Pattern p_table; 
              Matcher m_table;
              Pattern p_enter; 
              Matcher m_enter;
           
              try { 
               String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>"; 
               //定义script的正则表达式.
               String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>"; 
               //定义style的正则表达式. 
               String regEx_html = "<[^>]+>"; 
               //定义HTML标签的正则表达式 
               String regEx_houhtml = "/[^>]+>"; 
               //定义HTML标签的正则表达式 
               String regEx_spe="\\&[^;]+;";
               //定义特殊符号的正则表达式
               String regEx_blank=" +";
               //定义多个空格的正则表达式
               String regEx_table="\t+";
               //定义多个制表符的正则表达式
               String regEx_enter="\n+";
               //定义多个回车的正则表达式

               p_script = Pattern.compile(regEx_script,Pattern.CASE_INSENSITIVE); 
               m_script = p_script.matcher(htmlStr); 
               htmlStr = m_script.replaceAll(""); //过滤script标签

               p_style = Pattern.compile(regEx_style,Pattern.CASE_INSENSITIVE); 
               m_style = p_style.matcher(htmlStr); 
               htmlStr = m_style.replaceAll(""); //过滤style标签 
              
               p_html = Pattern.compile(regEx_html,Pattern.CASE_INSENSITIVE); 
               m_html = p_html.matcher(htmlStr); 
               htmlStr = m_html.replaceAll(""); //过滤html标签 
               
               p_houhtml = Pattern.compile(regEx_houhtml,Pattern.CASE_INSENSITIVE); 
               m_houhtml = p_houhtml.matcher(htmlStr); 
               htmlStr = m_houhtml.replaceAll(""); //过滤html标签 
               
               p_spe = Pattern.compile(regEx_spe,Pattern.CASE_INSENSITIVE); 
               m_spe = p_spe.matcher(htmlStr); 
               htmlStr = m_spe.replaceAll(""); //过滤特殊符号 
               
               p_blank = Pattern.compile(regEx_blank,Pattern.CASE_INSENSITIVE); 
               m_blank = p_blank.matcher(htmlStr); 
               htmlStr = m_blank.replaceAll(" "); //过滤过多的空格
               
               p_table = Pattern.compile(regEx_table,Pattern.CASE_INSENSITIVE); 
               m_table = p_table.matcher(htmlStr); 
               htmlStr = m_table.replaceAll(" "); //过滤过多的制表符

p_enter = Pattern.compile(regEx_enter,Pattern.CASE_INSENSITIVE); 
               m_enter = p_enter.matcher(htmlStr); 
               htmlStr = m_enter.replaceAll(" "); //过滤过多的制表符
               
               textStr = htmlStr; 
              
              }catch(Exception e) 
              { 
                    System.err.println("Html2Text: " + e.getMessage()); 
              } 
           
              return textStr;//返回文本字符串 
    }


本回答被提问者采纳






已赞过已踩过<

你对这个回答的评价是？
评论收起

推荐律师服务：若未解决您的问题，请您详细描述您的问题，通过百度律临进行免费专业咨询

JAVA解析html

其他类似问题

为你推荐：