We're updating the issue view to help you get more done. 

AutoJoin ignore num partitions

Description

The auto-joiner implementation currently ignores the number of partitions provided by the plugin.

Spark's RDD API lets you specify the number of partitions to use when shuffling, but Spark Dataset API doesn't let you specify the number of partitions. Instead, it looks like it uses whatever is the value of 'spark.sql.shuffle.partitions' (defaults to 200) in the SparkConf.
Need to see if we can still do a per-join value (for example, by setting the spark conf before the join), or whether this needs to become a pipeline level setting.

Release Notes

None

Activity

Show:
Albert Shau
June 17, 2020, 12:24 AM
Albert Shau
June 18, 2020, 6:05 AM
Fixed

Assignee

Albert Shau

Reporter

Albert Shau

Labels

None

Docs Impact

None

UX Impact

None

Components

Fix versions

Priority

Blocker
Configure