Users want to create dynamic pipelines for greater reusability and ease of operations. This guide walks you through configuring compute resources for running the pipelines during the pipeline runtime.

Background

Batch pipelines can be orchestrated using MapReduce or Spark engines. The resources for engines (CPU and Memory) are typically configured during design time and can be changed during runtime from the UI by changing the resources in the Resources tab as shown below. In addition, users can also change the compute profile which can be changed from the UI.

For dynamic pipelines, these resources should be configured via runtime arguments. The section below shows how to configure these resources.

Solution

Configuring Compute Profile

Compute profile can be configured at runtime using system.profile.name runtime argument (preferences). The value for the profile name should include the scope and profile name separated by a colon scope:profileName.

The following example starts the pipeline called BQMerge on a profile called dp10 in system scope:

As of 6.1.2 any system.profile prefixed configuration is filtered out in the UI. This is fixed in 6.1.3.

Configuring Engine Resources

The engine resources CPU (cores) and memory can be configured using runtime arguments (preferences). To configure resources for Spark Driver and Executor, use the following options:

Configuring Compute Resources for Dataproc