在cygwin运行$ bin/nutch crawl url.txt dir bothtm -depth 2是环境变量问题吗?求解
crawlstartedin:crawl-20120506005448rootUrlDir=bothtmthreads=10depth=2Injector:startin...
crawl started in: crawl-20120506005448
rootUrlDir = bothtm
threads = 10
depth = 2
Injector: starting
Injector: crawlDb: crawl-20120506005448/crawldb
Injector: urlDir: bothtm
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/nutch/bothtm
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:113) 展开
rootUrlDir = bothtm
threads = 10
depth = 2
Injector: starting
Injector: crawlDb: crawl-20120506005448/crawldb
Injector: urlDir: bothtm
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/nutch/bothtm
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:113) 展开
1个回答
展开全部
只看这一个错误,很显然,是路径的问题咐液,输入路径hadoop本身是不会自动创建的,这也是可以理解的,跟环境变量没任何绝简悔关系并正。
追问
改后变成:是
$ bin/nutch crawl url.txt -dir bothtom -depth 3 -topN 100 -threads 1
/*
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
*/
追答
这种情况有很多种可能,建议你去看下我的博客当中写的这种情况的可能处理办法,希望对你有帮助~
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询