Page Comparison

...

Starting CDAP 6.4.0, with adaptive query execution, really heavy skews will be handled automatically. As soon as a join produces some partitions much bigger than others, they are split into smaller ones. Please see the Autotuning section above to To confirm you have adaptive query execution enabled, see the Autotuning section above.

In-Memory Joins

In-memory joins were added in version 6.1.3. An in-memory join can be performed if one side of the join is small enough to fit in memory. In this situation, the small dataset is loaded into memory, and then broadcast to every executor. This means the large dataset is not shuffled at all, removing the uneven partitions generated when shuffling on the join key.

...

Versions Compared

Old Version 3

New Version 4

Key

In-Memory Joins