求助用hadoop-mapreduce-examples-2.7.3.jar怎么查询特定的词

 我来答

2个回答

#热议# 上班途中天气原因受伤算工伤吗？

从空去听8
2017-12-30 · TA获得超过7439个赞

知道大有可为答主

回答量：6907

采纳率：93%

帮助的人：5565万

我也去答题访问个人页

关注

展开全部

前2篇blog中测试hadoop代码的时候都用到了这个jar，那么很有必要去分析一下源码。

分析源码之前很有必要先写一个wordcount，代码如下

[java] view plain copy

package mytest;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

检索相关源码，发现需要2个jar，分别为hadoop-common-2.7.0.jar和hadoop-mapreduce-client-core-2.7.0.jar

使用myeclipse导出为Runnable Jar后，执行

[sql] view plain copy

~/hadoop-2.7.0/bin/hadoop jar my.jar mytest.WordCount /user/hadoop/input /user/hadoop/output3

测试成功

因为有个“package mytest”所以执行的时候需要使用mytest.WorCount！

仔细回忆之前执行命令的时候并没有加上类似mytest.这类的东西就能执行成功。我们去检索源码看看。执行。

[sql] view plain copy

find ~/ -name *hadoop-mapreduce-examples*

输出内容为

[sql] view plain copy

/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar
/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.0-sources.jar
/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.0-test-sources.jar
/home/hadoop/hadoop-2.7.0/share/doc/hadoop/hadoop-mapreduce-examples

解压缩hadoop-mapreduce-examples-2.7.0-sources.jar后导入myeclipse查看源码。

检索“grep”字段，发现出现在ExampleDriver.java中，看样这个文件是这个jar的入口。

那么Runnable Jar怎么确定这个文件的入口呢。解压缩Runnable Jar后发现META-INF 中有如下的描述

[sql] view plain copy

Main-Class: org.apache.hadoop.examples.ExampleDriver

原来Runnable Jar是可以配置默认入口的。可以通过myeclipse导出Jar的时候设置默认入口。

将ExampleDriver.java导入自己的工程，修修改改后，测试。执行

[sql] view plain copy

~/hadoop-2.7.0/bin/hadoop jar my.jar wordcount /user/hadoop/input /user/hadoop/output4

很多东西具体看源码比较详细，以后有特殊的地方可以细细分析。

tip：分析log日志可以发现。

map和reduce中的syso输出到log日志上。

Main中的syso输出到屏幕上。

已赞过 已踩过<

评论收起

cfer去
2017-05-24 · 超过399用户采纳过TA的回答

知道小有建树答主

回答量：542

采纳率：0%

帮助的人：362万

我也去答题访问个人页

关注

展开全部

用SequenceFileOutputFormat，要用LzoCodec，相应的读取这个输出的格式是SequenceFileInputFormat。

本回答被网友采纳

已赞过已踩过<

你对这个回答的评价是？
评论收起

推荐律师服务：若未解决您的问题，请您详细描述您的问题，通过百度律临进行免费专业咨询

求助 用hadoop-mapreduce-examples-2.7.3.jar怎么查询特定的词

其他类似问题

为你推荐：

求助用hadoop-mapreduce-examples-2.7.3.jar怎么查询特定的词