Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

By default the number of worker nodes is set as 2. To increase the parallelism for large workloads or a pipeline with multiple Deduplicate, aggregate, or Joiner plugins, configure a Dataproc compute profile with a larger number of executorsworkers.

Number of Partitions

By default, the number of partitions is not set in the Joiner, Deduplicate, and aggregate plugins. This allows for the underlying framework (Spark) to determine the partitions. If the number of partitions are changed manually, ensure that the number of partitions is less that number of executors (in the case of dynamic allocation, the number of container per node).