DynamicPartitioner will open multiple record writers, one for each dynamic partition it is creating. Some output formats (for example, ORC) require a lot of memory when writing, and having multiple writers open can cause out-of-memory issues. To avoid that, there should be an option to keep at most one writer open at any given time. However, that will only work if the reducer (or mapper for a map-only job) emits all records for a partition consecutively. This cannot be generally assumed. Thus:
add an option to specify that only one writer should be open at once
close the current partition as soon as a key for a different partition is seen
if a partition is closed and we see another key for that partition, throw an error
DynamicPartitioner can now limit the number of open RecordWriters to one, if the output partition keys are grouped.
Instead of just limiting to one, perhaps the upper limit of writers can be configurable.
This would help in the case that the user's data is nearly completely ordered, but not completely.
Edit: There doesn't seem to be a strong use case for this, and this it would be difficult to enforce the data in such a way that this is useful, so not pursuing this.
Implemented in https://github.com/caskdata/cdap/pull/7056