Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Keep in mind that Spark will request for more memory than the executor memory set for the pipeline, and that YARN will round that requested amount up. For example, suppose you have set your executor memory to 2048mb, and have not given a value for spark.yarn.executor.memoryOverhead, which means the default of 384mb is used. That means Spark will request 2048mb + 384mb for each executor, which YARN will round up to an exact multiple of the YARN minimum allocation. When running on a 8gb worker node, because the YARN minimum allocation is 512mb, it will get rounded up to 2.5gb. This means each worker will be able to run two containers, using up all of the available CPUs, but leaving 1gb of YARN memory (6gb - 2.5gb - 2.5gb) unused. This means the worker node can actually be sized a little smaller, or the executors can be given a little bit more memory. When running on a 16gb worker node, 2048mb + 384mb will get rounded up to 3gb because the YARN minimum allocation is 1024mb. This means each worker node is able to run four containers, with all CPUs and YARN memory in use.

...