Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This approach requires you to give enough memory to both the Spark driver and executors to allow them to store the broadcast dataset in memory. By default, Spark reserves slightly less than 30% of its memory for storing this type of data. When using in-memory joins, multiply the size of the dataset by 4 and set that as the executor and driver memory. For example, if the items dataset was 1gb 1 GB in size, we would need to set the executor and driver memory to at least 4gb4 GB. Datasets larger than 8gb 8 GB cannot be loaded into memory.

...