如何用perl处理html表格,抓取html表格里的信息

 我来答
tlint
2014-11-01 · TA获得超过201个赞
知道小有建树答主
回答量:174
采纳率:0%
帮助的人:80.8万
展开全部

试试HTML::TableExtract模块,例子:

招商银行储蓄存款利率表http://www.cmbchina.com/CmbWebPubInfo/InterestRate.aspx?chnl=ckrate

将其中的表格节选部分html放到perl变量$content中处理。


#!/usr/bin/perl
use warnings;
use strict;
use HTML::TableExtract;
use Data::Dumper;
my $content=<<EOF;
<table class="tablecontent"    cellpadding="0" cellspacing="1" border="0">
       
                   <thead>
                    <tr>
                        <th>存期</th>
                        <th>人民币</th>
                        <th>美元</th>
                        <th>英镑</th>
                        <th>欧元</th>
                        <th>日元</th>
                        <th>港币</th>
                        <th>加拿大元</th>
                        <th>瑞士法郎</th>
                        <th>澳大利亚元</th>
                         <th>新加坡元</th>
                    </tr>
                </thead>
                   
                        <tr class="">
                        <td>活期</td>
                        <td>0.3850</td>
                        <td>0.0500</td>
                        <td>0.0500</td>
                        <td>0.0050</td>
                        <td>0.0001</td>
                        <td>0.0100</td>
                        <td>0.0100</td>
                        <td>0.0001</td>
                        <td>0.2375</td>
                        <td>0.0001</td>
                        </tr>
                   
                        <tr class="alteritem">
                        <td>通知存款 一天</td>
                        <td>0.8800</td>
                        <td></td>
                        <td></td>
                        <td></td>
                        <td></td>
                        <td></td>
                        <td></td>
                        <td></td>
                        <td></td>
                        <td></td>
                        </tr>
                   
                        <tr class="">
                        <td>通知存款 七天</td>
                        <td>1.4850</td>
                        <td>0.0500</td>
                        <td>0.0500</td>
                        <td>0.0050</td>
                        <td>0.0005</td>
                        <td>0.0100</td>
                        <td>0.0100</td>
                        <td>0.0005</td>
                        <td>0.2625</td>
                        <td>0.0005</td>
                        </tr>
                   
       </table>
EOF
my $te = new HTML::TableExtract();
$te->parse( $content );
for my $ts ($te->table_states) { 
    print $ts; 
    for my $row ($ts->rows) { 
        print Dumper $row;         
    } 
}

输出:

HTML::TableExtract::Table=HASH(0xa1b7c08)$VAR1 = [
          '存期',
          '人民币',
          '美元',
          '英镑',
          '欧元',
          '日元',
          '港币',
          '加拿大元',
          '瑞士法郎',
          '澳大利亚元',
          '新加坡元'
        ];
$VAR1 = [
          '活期',
          '0.3850',
          '0.0500',
          '0.0500',
          '0.0050',
          '0.0001',
          '0.0100',
          '0.0100',
          '0.0001',
          '0.2375',
          '0.0001'
        ];
$VAR1 = [
          '通知存款 一天',
          '0.8800',
          undef,
          undef,
          undef,
          undef,
          undef,
          undef,
          undef,
          undef,
          undef
        ];
$VAR1 = [
          '通知存款 七天',
          '1.4850',
          '0.0500',
          '0.0500',
          '0.0050',
          '0.0005',
          '0.0100',
          '0.0100',
          '0.0005',
          '0.2625',
          '0.0005'
        ];

du瓶邪
2015-08-08 · TA获得超过2.4万个赞
知道大有可为答主
回答量:1.7万
采纳率:100%
帮助的人:2952万
展开全部
my ($flag, @key, @val) = ();

while(<FILE>) {

chomp;

s/^\s+|\s+$//g;

if(/^<table>$/ .. /^<\/table>$/) {

if(/<\/tr>/) {

$flag = 1;

}

if(/<td>(.+)<\/td>/) {

$flag ? push @val, $1 : push @key, $1;

}

}

}

for (my $i = 0; $i < @key; $i++) {

print $key[$i], ' => ', $val[$i], "\n";

}
已赞过 已踩过<
你对这个回答的评价是?
评论 收起
推荐律师服务: 若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询

为你推荐:

下载百度知道APP,抢鲜体验
使用百度知道APP,立即抢鲜体验。你的手机镜头里或许有别人想知道的答案。
扫描二维码下载
×

类别

我们会通过消息、邮箱等方式尽快将举报结果通知您。

说明

0/200

提交
取消

辅 助

模 式