Troubleshooting: Resolve program type errors

This page describes how to resolve a known issue where a data pipeline fails with an unsupported program type error in CDAP.

To reduce the start time for pipelines that run on GCP Dataproc, CDAP versions 6.8.0 and 6.8.1 instances cache the artifacts that are required to start a pipeline in a Dataproc cluster inside a Cloud Storage bucket. One of these cached artifacts is application.jar. Depending on the order in which you run your pipelines, some pipelines might fail with the following error: Unsupported program type: Spark

For example, after you create a new 6.8.1 instance (or upgrade to 6.8.1), the first time that you run a pipeline that only contains actions, it succeeds. However, the next pipeline runs, which include sources or sinks, might fail with this error.

Recommendation

To resolve this issue, disable Cloud Storage caching by a preference or runtime argument.

Note: Disabling Cloud Storage caching results in a pipeline taking slightly longer to start running, as fewer artifacts are cached.

You can disable caching for any of the following:

  • For all pipelines in an instance.

  • For a given namespace.

  • For the specific Dataproc profiles that contain the failing pipelines.

  • For only the failing pipelines.

Disable Cloud Storage caching for all pipelines in an instance

To disable Cloud Storage caching for all pipelines in an instance, follow these steps:

  1. In CDAP, click System Admin > Configuration.

  2.  In the System Preferences section, click Edit System Preferences.

  3.  Set the value for system.profile.properties.gcsCacheEnabled to false.

Note: This change impacts start time for all pipelines in the instance.

To set this through the REST API, see Set preferences

Disable Cloud Storage caching for a given namespace

To disable Cloud Storage caching for a given namespace, follow these steps:

  1. In CDAP, click System Admin > Configuration.

  2. In the Namespaces section,  select your namespace.

  3. Click Preferences > Edit and set the value for system.profile.properties.gcsCacheEnabled to false.

Note: This change impacts start time for all pipelines in the namespace.

To set this through the REST API, see Set preferences

Disable Cloud Storage caching for a Dataproc profile

To disable Cloud Storage caching for the specific Dataproc profiles that contain the failing pipelines, follow these steps:

  1. In CDAP, click System Admin > Configuration.

  2. In the System Compute Profile section, find the Dataproc profile, click the three dots, and click Edit.

  3. Set gcsCacheEnabled to false in the Dataproc profile.

Disable Cloud Storage caching for only the failing pipelines

To disable Cloud Storage caching for only the failing pipelines, follow these steps:

  1. In CDAP, click the hamburger menu.

  2. Click List.

  3. Click the failing pipeline.

  4. Click Expand next to Run and set the runtime argument system.profile.properties.gcsCacheEnabled to false.

  5. Repeat for any other failing pipelines.

Cloud Storage caching can be disabled when starting a pipeline through REST API and also by optionally specifying runtime arguments as a JSON map in the request body. For more information, see Start a program

Created in 2020 by Google Inc.