Sink should not be included in multiple mapreduce phases

Description

The mapreduce pipeline planner can place the same sink in multiple mapreduce phases. For some sinks, this is ok but for others it is not. For example, I believe the partitioned file set sinks will fail because whatever job happens to finish first will successfully add a partition, but the second job will try to add that same partition and fail.

The planner should instead ensure that connectors are used to ensure that sinks are only written to once in a single mapreduce job, similar to how we ensure that a source is only read from once in a single mapreduce job.

An example pipeline that causes this issue looks like:

Release Notes

Fixed a planner bug to ensure that sinks are never placed in two different mapreduce phases in the same pipeline.

Activity

Show:
Albert Shau
February 20, 2018, 11:58 PM
Fixed

Assignee

Albert Shau

Reporter

Albert Shau

Labels

None

Docs Impact

None

UX Impact

None

Fix versions

Priority

Major