...
Advanced Settings
Image Version
The Dataproc image version. If none is given, one will automatically be chosen. If custom image URI is specified, this field will be ignored.
Custom Image URI
Dataproc image URI. If the URI is not specified, it will be inferred from the Image Version.
GCS Bucket
The Cloud storage bucket used by Cloud Dataproc to read/write cluster and job data.
Encryption Key Name
The GCP customer managed encryption key (CMEK) name used by Cloud Dataproc.
Autoscaling Policy
Specify the Autoscaling Policy ID (name) or the resource URI.
For information about configuring and using Dataproc autoscaling to automatically and dynamically resize clusters to meet workload demands, see theĀ Autoscaling clusters guide.
Recommended: Use autoscaling policies for increasing the cluster size, not for decreasing the size. Decreasing the cluster size with autoscaling removes nodes that hold intermediate data, which might cause your pipelines to run slowly or fail.
Initialization Actions
A list of scripts to be executed during initialization of the cluster. Init actions should be placed on Google Cloud Storage.
Cluster Properties
Cluster properties used to override default configuration properties for the Hadoop services. For example, the default Spark parallelism can be overridden by setting a value for spark:spark.default.parallelism
. For more information, see Cluster properties.
...
Whether to skip cluster deletion at the end of a run. Clusters will need to be deleted manually. This should only be used when debugging a failed run.
Default is False.
Enable Stack Driver Logging Integration
...
Enable Component Gateway to allow access to cluster UIs like the YARN ResourceManager and Spark HistoryServer.
Default is False.
Prefer External IP
When the system is running on Google Cloud Platform in the same network as the cluster, it will normally use the internal IP when communicating with the cluster. Set to True to always use the external IP.
Default is False.
Polling Settings
Polling settings control how often cluster status should be polled when creating and deleting clusters. You may want to change these settings if you have a lot of pipelines scheduled to run at the same time using the same GCP account.
...