While using multiple sinks - the ones that use Outputformats (eg: s3). The mapper output metrics are incorrect. The map input records show the right number but the map output records are not.
To reproduce - create an ETL batch reading from stream, writing to S3 and Table.
Fix a bug where certain MapReduce metrics were not being properly emitted when using multiple outputs.
DataCleansing example application can also be used to reproduce this, as that MapReduce job has two output datasets.
Normally, the user calls context#write(key, value) - context in this case is a Hadoop class, which automatically increments the counter as well as writing the record.
In the case of multiple outputs, the user calls context#write(outputName, key, value) - in this case, the context
is a CDAP class, and this doesn't at all translate into a call to Hadoop's context#write. Because of that, the
metrics for output records aren't automatically incremented.
To fix this, use a MeteredRecordWriter. Merged https://github.com/caskdata/cdap/pull/4799