Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Splits or partitions. A data set is split into splits (other name partitions) to process it in parallel. If you don’t have enough splits you can’t utilize the who cluster.

  • Nodes (driver master and worker). Physical or virtual machines that can do processing. Driver nodes run driver processes that coordinate work, worker nodes run executors that process data. They have machine characteristics (amount of memory and number of vCores available for processes).

  • vCores, Cores or CPUs. A resource that does computing. Usually your nodes provide a certain amount of Cores and your Executors request 1 or few CPUs. You want this to be balanced along with memory otherwise you will end up underutilizing your cluster.

Executor processes. The processes that do actual data processing. You define how much memory and how many CPUs those need and they are scheduled on your nodes accordingly. You want to have enough splits to have all of the executors utilized.


Table of Contents