Как запустить программу MapReduce2 удаленно с Windows?

Question

MarkizaSckuza @MarkizaSckuza

Как запустить программу MapReduce2 удаленно с Windows?

Здравсвтуйте.

Хочу написать простой пример на MapReduce v2 (Hadoop YARN) и запустить его удаленно.

Что было сделано:

1. Поставила hortonworks sandbox на свою VirtualBox. Коннекшн работает, при попытке зайти на http:\\localhost:8888 открывается стартовая страница хадупа.
2. Написала простой "word count" пример:

public class WordCount {

    public static void main(String[] args) throws IOException {
        JobConf job = new JobConf(WordCount.class);


        job.set("yarn.resourcemanager.address", "hdfs://localhost:8032");
        job.set("yarn.nodemanager.address", "hdfs://localhost:8041");
        job.set("yarn.nodemanager.localizer.address", "hdfs://localhost:8040");
        job.set("mapreduce.jobhistory.address", "hdfs://localhost:10020");

        job.set("fs.defaultFS", "hdfs://localhost:8020");
        job.set("hbase.zookeeper.quorum", "hdfs://localhost:2888");

        job.setJarByClass(WordCount.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormat(TextInputFormat.class);
        job.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path("text"));
        FileOutputFormat.setOutputPath(job, new Path("output.txt"));

        JobClient.runJob(job);
    }

    public static class Map implements Mapper<LongWritable, org.apache.hadoop.io.Text, org.apache.hadoop.io.Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable longWritable, Text text, OutputCollector<Text, IntWritable> outputCollector, Reporter reporter) throws IOException {
            String line = text.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);

            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                outputCollector.collect(word, one);
            }
        }

        public void close() throws IOException {}

        public void configure(JobConf jobConf) {}
    }

    public static class Reduce implements Reducer<Text, IntWritable, org.apache.hadoop.io.Text, IntWritable> {

        public void reduce(Text text, Iterator<IntWritable> iterator, OutputCollector<Text, IntWritable> outputCollector, Reporter reporter) throws IOException {
            int sum = 0;

            while (iterator.hasNext()) {
                sum += iterator.next().get();
            }

            outputCollector.collect(text, new IntWritable(sum));
        }

        public void close() throws IOException {}

        public void configure(JobConf jobConf) {}
    }
}

И получаю вот такой Error: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

На StackOverFlow нашла некоторые решения, суть в том, что нужно задать Environment Variable. Но проблема в том, что я хочу подключиться к хадупу на удаленной машине, а на моём Windows его нет.

Что я делаю не так? Может, я что-то упустила?

Hadoop version is 2.7.1.2.4.0.0-169
HDP 2.4.0.0-169
Java 8
Windows 10