Dynamic resource configuration
Users want to create dynamic pipelines for greater reusability and ease of operations. This guide walks you through configuring compute resources for running the pipelines during the pipeline runtime.
Background
Batch pipelines can be orchestrated using MapReduce or Spark engines. The resources for engines (CPU and Memory) are typically configured during design time and can be changed during runtime from the UI by changing the resources in the Resources tab as shown below. In addition, users can also change the compute profile which can be changed from the UI.
Â
For dynamic pipelines, these resources should be configured via runtime arguments. The section below shows how to configure these resources.
Solution
Configuring Compute Profile
Compute profile can be configured at runtime using system.profile.name
runtime argument (preferences). The value for the profile name should include the scope and profile name separated by a colon scope:profileName
.
The following example starts the pipeline called BQMerge on a profile called dp10 in system scope:
As of 6.1.2 any system.profile prefixed configuration is filtered out in the UI. This is fixed in 6.1.3.
Configuring Engine Resources
The engine resources CPU (cores) and memory can be configured using runtime arguments (preferences). To configure resources for Spark Driver and Executor, use the following options:
task.driver.system.resources.memory
to configure the memory for Spark Driver.Memory is configured in Megabytes.
Example: Setting
task.driver.system.resources.memory
to 2048 sets the driver memory resources to 2 GB (2048 MB).
task.driver.system.resources.cores
to configure the CPU (cores) for Spark Driver.By default the driver CPU is set to 1 core.
Example: Setting
task.driver.system.resources.cores
to 2 sets the driver cores to 2.
task.executor.system.resources.memory
to configure the memory for Spark Executors.Memory is configured in Megabytes
Example:
task.executor.system.resources.memory
2048 sets the executor memory resources to 2 GB (2048 MB).
task.executor.system.resources.cores
to configure the CPU (cores) for Spark Executors.By default the driver CPU (cores) is set to 1 core.
Example:
task.executor.system.resources.cores
2 configures 2 cores for all executors.
Configuring Compute Resources for Dataproc
system.profile.properties.serviceAccount
service account for the Dataproc cluster.system.profile.properties.masterNumNodes
to set the number of master nodes.system.profile.properties.masterMemoryMB
to set the memory per master node.system.profile.properties.masterCPUs
to set the number of CPUs for the master.system.profile.properties.masterDiskGB
to set the disk in GB per master node.system.profile.properties.workerNumNodes
to set the number of worker nodes.system.profile.properties.workerMemoryMB
to set the memory per worker node.system.profile.properties.workerCPUs
to set the number of CPUs per worker node.system.profile.properties.workerDiskGB
to set the disk in GB per worker node.system.profile.properties.stackdriverLoggingEnabled
to true to enable Stackdriver logging for the pipelines.system.profile.properties.stackdriverMonitoringEnabled
to true to enable Stackdriver monitoring for the pipelines.system.profile.properties.imageVersion
to configure Dataproc image version.system.profile.properties.network
to configure network for the Dataproc cluster.
Created in 2020 by Google Inc.