Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A MapReduce program can interact with a dataset by using it as an input or an output. The dataset needs to implement specific interfaces to support this, as described in the following sections.

A Dataset as the Input Source of a MapReduce Program

When you run a MapReduce program, you can configure it to read its input from a dataset. The source dataset must implement the BatchReadable interface, which requires two methods:

...

Code Block
@UseDataSet("myTable")
KeyValueTable kvTable;
...
@Override
public void initialize() throws Exception {
  MapReduceContext context = getContext();
  ...
  context.addInput(Input.ofDataset("myTable", kvTable.getSplits(16, startKey, stopKey)));
}

A Dataset as the Output Destination of a MapReduce Program

Just as you have the option to read input from a dataset, you have the option to write to a dataset as the output destination of a MapReduce program if that dataset implements the BatchWritable interface:

...

The write() method is used to redirect all writes performed by a Reducer to the dataset. Again, the KEY and VALUE type parameters must match the output key and value type parameters of the Reducer.

Multiple Output Destinations of a MapReduce Program

To write to multiple output datasets from a MapReduce program, begin by adding the datasets as outputs:

...

Note that the multiple output write method—method, MapReduceTaskContext.write(String, KEY key, VALUE value)—can , can only be used if there are multiple outputs. Similarly, the single output write method—method, MapReduceTaskContext.write(KEY key, VALUE value)—can , can only be used if there is a single output to the MapReduce program.

...