java正则表达式多行匹配 10
<DATELINE>CLEVELAND,Feb26-</DATELINE><BODY>StandardOilCoandBPNorthAmericaIncsaidtheyp...
<DATELINE> CLEVELAND, Feb 26 - </DATELINE><BODY>Standard Oil Co and BP North America
Inc said they plan to form a venture to manage the money market
borrowing and investment activities of both companies.
and will be operated by Standard Oil under the oversight of a
joint management committee.
Reuter
</BODY></TEXT>
</REUTERS>
我要匹配<BODY></BODY>之间的内容,我用了下面的代码,可是不行,不在同一行的标签貌似匹配不上
import java.io.*;
import java.util.regex.*;
public class Regextract {
public static void main(String args[])throws IOException{
String str=null;
BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream("E:\\科研资料\\数据集\\路透社\\reuters21578\\test.txt")));
while((str=br.readLine())!=null){
// str+="\n";
extract(str);
}
}
public static void extract(String str){ Pattern p=Pattern.compile("(?<=<(BODY)>)(?:.|[\r\n])*(?=<\\/\\1>)",Pattern.MULTILINE | Pattern.DOTALL);
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group());
}
}
}
望大神指导指导啊~~~ 展开
Inc said they plan to form a venture to manage the money market
borrowing and investment activities of both companies.
and will be operated by Standard Oil under the oversight of a
joint management committee.
Reuter
</BODY></TEXT>
</REUTERS>
我要匹配<BODY></BODY>之间的内容,我用了下面的代码,可是不行,不在同一行的标签貌似匹配不上
import java.io.*;
import java.util.regex.*;
public class Regextract {
public static void main(String args[])throws IOException{
String str=null;
BufferedReader br=new BufferedReader(new InputStreamReader(new FileInputStream("E:\\科研资料\\数据集\\路透社\\reuters21578\\test.txt")));
while((str=br.readLine())!=null){
// str+="\n";
extract(str);
}
}
public static void extract(String str){ Pattern p=Pattern.compile("(?<=<(BODY)>)(?:.|[\r\n])*(?=<\\/\\1>)",Pattern.MULTILINE | Pattern.DOTALL);
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group());
}
}
}
望大神指导指导啊~~~ 展开
展开全部
问题在这一段
while((str=br.readLine())!=null){
// str+="\n";
extract(str);
}
这是一行一行的去匹配
改成这样试试:
String content="";
while((str=br.readLine())!=null){
// str+="\n";
content += str + "\n";
}
extract(content);
while((str=br.readLine())!=null){
// str+="\n";
extract(str);
}
这是一行一行的去匹配
改成这样试试:
String content="";
while((str=br.readLine())!=null){
// str+="\n";
content += str + "\n";
}
extract(content);
追问
编译可以了
但是当要匹配的内容增多时,就出现了
Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.codePointAt(Unknown Source)
追答
分配给java的内存不够了,可以根据文件大小进行调整,如果只是提取body标签的内容,可以一行一行的读文件,找到包含和的行,再把两个标签之间的内容读出来。
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
展开全部
String str=null;
BufferedReader br = new ....;
while(true) {
String line = br.readLine();
if( line == null ) break;
str += line + "\r\n";
}
extract(str);
BufferedReader br = new ....;
while(true) {
String line = br.readLine();
if( line == null ) break;
str += line + "\r\n";
}
extract(str);
本回答被网友采纳
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
展开全部
hytffttttttttttttttttttttttttttt
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询