Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A GCP project ID must be provided. This will be the project that the Cloud Dataproc cluster is created in. The project must have the Cloud Dataproc APIs enabled.

Creator Service Account Key

The service account key provided to the provisioner must have rights to access the Cloud Dataproc APIs and the Google Compute Engine APIs. Since your account key is sensitive, we recommend that you provide your account key through the CDAP Secure Storage by adding a secure key with the Microservices. After you create the secure key, you can add it to a namespace or system compute profile. For a namespace compute profile, click the shield icon and select the secure key. For a system compute profile, type the name of the key in the Secure Account Key field to add the secure key to the compute profile.

...

Assign Network tags to apply firewall rules to the specific nodes of a cluster. Network tags must start with a lowercase letter and can contain lowercase letters, numbers, and hyphens. Tags must end with a lowercase letter or number.

Shielded VMs

Note: Shielded VM settings were introduced in CDAP 6.5.0.

Enable Secure Boot

Defines whether the Dataproc VMs have Secure Boot enabled.

Default is False.

Enable vTPM

Defines whether the Dataproc VMs have the virtual Trusted Platform Module (vTPM) enabled.

Default is False.

Enable Integrity Monitoring

Defines whether Dataproc VMs have integrity monitoring enabled.

Default is False.

Advanced Settings

Image Version

...

Cluster properties used to override default configuration properties for the Hadoop services. For example, the default Spark parallelism can be overridden by setting a value for spark:spark.default.parallelism. For more information, see Cluster properties.

Labels

Note: Labels were introduced in CDAP 6.5.0.

A label is a key-value pair that helps you organize your Google Cloud Dataproc clusters and jobs. You can attach a label to each resource, and then filter the resources based on their labels. Information about labels is forwarded to the billing system, so customers can break down your billing charges by label.

Specifies labels for the Dataproc cluster being created.

Max Idle Time

Configure Dataproc to delete the cluster if it has been idle for longer than this many minutes. Clusters are normally deleted directly after a run ends, but this delete may fail in rare situations. For example, if permissions are revoked in the middle of a run, or if there is a Dataproc outage. Use this to ensure that clusters are eventually deleted even if the instance is unable to delete the cluster for any reason.

...