有关Perl文本长度的问题,求大神进!
#!/usr/bin/envperlusecommon::sense;useEncode;useLingua::Stem::Snowball;useLingua::Sto...
#!/usr/bin/env perl
use common::sense;
use Encode;
use Lingua::Stem::Snowball;
use Lingua::StopWords qw(getStopWords);
use Scalar::Util qw(looks_like_number);
my $stemmer = Lingua::Stem::Snowball->new(
encoding => 'UTF-8',
lang => 'en',
);
my %stopwords = map {
lc
} keys %{getStopWords(en => 'UTF-8')};
local $, = ' ';
say map {
sub {
my @w =
map {
encode_utf8 $_
} grep {
length >= 2
and not looks_like_number($_)
and not exists $stopwords{lc($_)}
} split
/[\W_]+/x,
shift;
$stemmer->stem_in_place(\@w);
map {
lc decode_utf8 $_
} @w
}->($_);
} <>;
我在粉红色尖括号里面输入我要处理的本文,但是文本太长了,所以系统报错:Excessively long <> operator at stem.pl line 38. 我试了一下文本超过2行系统就会报错,但是我的文本有几百行。。
有大神告诉我说确定每行文本长度之后写一个脚本就可以了,但是我不会写脚本。。有什么其他办法吗?或者跪求一份脚本
粉色尖括号就是最后一个<> 展开
use common::sense;
use Encode;
use Lingua::Stem::Snowball;
use Lingua::StopWords qw(getStopWords);
use Scalar::Util qw(looks_like_number);
my $stemmer = Lingua::Stem::Snowball->new(
encoding => 'UTF-8',
lang => 'en',
);
my %stopwords = map {
lc
} keys %{getStopWords(en => 'UTF-8')};
local $, = ' ';
say map {
sub {
my @w =
map {
encode_utf8 $_
} grep {
length >= 2
and not looks_like_number($_)
and not exists $stopwords{lc($_)}
} split
/[\W_]+/x,
shift;
$stemmer->stem_in_place(\@w);
map {
lc decode_utf8 $_
} @w
}->($_);
} <>;
我在粉红色尖括号里面输入我要处理的本文,但是文本太长了,所以系统报错:Excessively long <> operator at stem.pl line 38. 我试了一下文本超过2行系统就会报错,但是我的文本有几百行。。
有大神告诉我说确定每行文本长度之后写一个脚本就可以了,但是我不会写脚本。。有什么其他办法吗?或者跪求一份脚本
粉色尖括号就是最后一个<> 展开
1个回答
展开全部
#!/usr/bin/perl
use 5.010;
use Lingua::Stem::Snowball qw/stem/;
use Lingua::StopWords qw( getStopWords );
open IN, "<./sentence.txt" or die "Cannot open file $!\n";
open OUT1, ">./result.txt";
while(<IN>)
{
my $stopwords = getStopWords('en');
my @words=$_=~/(\S+)/g;
@words=grep { !$stopwords->{$_} } @words;
#print @words;
my $stemmer = Lingua::Stem::Snowball->new( lang => 'en' );
$stemmer->stem_in_place( \@words );
my @stems = stem( 'en', \@words );
print join " ",@stems;
print "\n";
print OUT1 join " ",@stems;
print OUT1 "\n";
}
use 5.010;
use Lingua::Stem::Snowball qw/stem/;
use Lingua::StopWords qw( getStopWords );
open IN, "<./sentence.txt" or die "Cannot open file $!\n";
open OUT1, ">./result.txt";
while(<IN>)
{
my $stopwords = getStopWords('en');
my @words=$_=~/(\S+)/g;
@words=grep { !$stopwords->{$_} } @words;
#print @words;
my $stemmer = Lingua::Stem::Snowball->new( lang => 'en' );
$stemmer->stem_in_place( \@words );
my @stems = stem( 'en', \@words );
print join " ",@stems;
print "\n";
print OUT1 join " ",@stems;
print OUT1 "\n";
}
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询