ConcurrentPartitionConsumer WorkingSet does not aged out partitions that no longer exists
Description
Partitions that are in the WorkingSet and are later dropped will remain in the WorkingSet forever. To reproduce, added two partitions to a PartitionedFileSet and set the ConcurrentPartitionConsumer's ConsumerConfiguration with a maxWorkingSetSize of 1 and the timeout of 60. Execute consumePartitions and mark the partition as failed with onFinish. Drop the partition that was marked as failed. Call consumePartitions again and nothing will return because the failed partition that was dropped is still in the workingSet.
Release Notes
PartitionConsumer appropriately drops partitions that have been deleted from a corresponding PartitionedFileSet
Hey Ali, did you fix this in 3.5? I thought I saw the fix in the code.
Ali Anwar July 8, 2016 at 5:20 AM
Right - if the onFinish of the partitions (of the partition consumer) is done within the MapReduce job (such as in the onFinish method), then the MapReduce's transaction would get aborted. However, if the failure/exception is thrown from an action after the MapReduce, then the output of the MR won't get rollbacked, since it's in a separate transaction. I forgot about that earlier.
Fixed
Pinned fields
Click on the next to a field label to start pinning.
Partitions that are in the WorkingSet and are later dropped will remain in the WorkingSet forever. To reproduce, added two partitions to a PartitionedFileSet and set the ConcurrentPartitionConsumer's ConsumerConfiguration with a maxWorkingSetSize of 1 and the timeout of 60. Execute consumePartitions and mark the partition as failed with onFinish. Drop the partition that was marked as failed. Call consumePartitions again and nothing will return because the failed partition that was dropped is still in the workingSet.