Dynamic resource configuration
Users want to create dynamic pipelines for greater reusability and ease of operations. This guide walks you through configuring compute resources for running the pipelines during the pipeline runtime.
Background
Batch pipelines can be orchestrated using MapReduce or Spark engines. The resources for engines (CPU and Memory) are typically configured during design time and can be changed during runtime from the UI by changing the resources in the Resources tab as shown below. In addition, users can also change the compute profile which can be changed from the UI.
For dynamic pipelines, these resources should be configured via runtime arguments. The section below shows how to configure these resources.
Solution
Configuring Compute Profile
Compute profile can be configured at runtime using system.profile.name runtime argument (preferences). The value for the profile name should include the scope and profile name separated by a colon scope:profileName.
The following example starts the pipeline called BQMerge on a profile called dp10 in system scope:
As of 6.1.2 any system.profile prefixed configuration is filtered out in the UI. This is fixed in 6.1.3.
Configuring Engine Resources
The engine resources CPU (cores) and memory can be configured using runtime arguments (preferences). To configure resources for Spark Driver and Executor, use the following options:
task.driver.system.resources.memoryto configure the memory for Spark Driver.Memory is configured in Megabytes.
Example: Setting
task.driver.system.resources.memoryto 2048 sets the driver memory resources to 2 GB (2048 MB).
task.driver.system.resources.coresto configure the CPU (cores) for Spark Driver.By default the driver CPU is set to 1 core.
Example: Setting
task.driver.system.resources.coresto 2 sets the driver cores to 2.
task.executor.system.resources.memoryto configure the memory for Spark Executors.Memory is configured in Megabytes
Example:
task.executor.system.resources.memory2048 sets the executor memory resources to 2 GB (2048 MB).
task.executor.system.resources.coresto configure the CPU (cores) for Spark Executors.By default the driver CPU (cores) is set to 1 core.
Example:
task.executor.system.resources.cores2 configures 2 cores for all executors.
Configuring Compute Resources for Dataproc
system.profile.properties.serviceAccountservice account for the Dataproc cluster.system.profile.properties.masterNumNodesto set the number of master nodes.system.profile.properties.masterMemoryMBto set the memory per master node.system.profile.properties.masterCPUsto set the number of CPUs for the master.system.profile.properties.masterDiskGBto set the disk in GB per master node.system.profile.properties.workerNumNodesto set the number of worker nodes.system.profile.properties.workerMemoryMBto set the memory per worker node.system.profile.properties.workerCPUsto set the number of CPUs per worker node.system.profile.properties.workerDiskGBto set the disk in GB per worker node.system.profile.properties.stackdriverLoggingEnabledto true to enable Stackdriver logging for the pipelines.system.profile.properties.stackdriverMonitoringEnabledto true to enable Stackdriver monitoring for the pipelines.system.profile.properties.imageVersionto configure Dataproc image version.system.profile.properties.networkto configure network for the Dataproc cluster.
Created in 2020 by Google Inc.