Pipelines failing with `Unsupported program type: Spark` due to caching application.jar in GCS

Description

Reason: if the very first run on CDAP does not involve spark (e.g., action only pipeline), then the cached application.jar is not able to run pipelines with spark.

How to reproduce:

  1. Delete cached_artifacts folder in GCS.

  2. Run a pipeline that only contains an action.

  3. Try to run a pipeline QuickStart pipeline. This will result in pipeline failure with the following error:

2023-03-10 21:27:29,512 - ERROR [main:i.c.c.i.a.r.d.r.DefaultRuntimeJob@342] - Failed to execute program program_run:default.us-west2-3.-SNAPSHOT.workflow.DataPipelineWorkflow.4a817ab2-bf8a-11ed-a59b-4ef69e077cbc java.lang.IllegalArgumentException: Unsupported program type: Spark at io.cdap.cdap.app.guice.DefaultProgramRunnerFactory.create(DefaultProgramRunnerFactory.java:89) at io.cdap.cdap.internal.app.runtime.distributed.DistributedWorkflowProgramRunner.setupLaunchConfig(DistributedWorkflowProgramRunner.java:150) at io.cdap.cdap.internal.app.runtime.distributed.DistributedProgramRunner.run(DistributedProgramRunner.java:205) at io.cdap.cdap.internal.app.runtime.distributed.runtimejob.DefaultRuntimeJob.run(DefaultRuntimeJob.java:283) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at io.cdap.cdap.runtime.spi.runtimejob.DataprocJobMain.main(DataprocJobMain.java:142) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)

 

Workaround: you can set the following properties in System Preferences:

system.profile.properties.gcsCacheEnabled = false

 

Another workaround is to delete application.jar from <bucket>/cdap-job/cached-artifacts/[cdap-version] folder, and try to run a pipeline with spark. This will caused generating a new application.jar with correct classes.

Release Notes

Fixed an issue that sometimes caused pipelines to fail when running pipelines on GCP Dataproc with the following error: Unsupported program type: Spark. The first time a pipeline that only contained actions ran on a newly created or upgraded instance, it succeeded. However, the next pipeline runs, which included sources or sinks, might have failed with this error.

Activity

Show:

Masoud Saeida Ardekani March 10, 2023 at 10:26 PM

Note that disabling caching may impact pipeline start time as we no longer cache artifacts in GCS

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Triaged

Yes

Fix versions

Priority

Created March 10, 2023 at 9:53 PM
Updated April 5, 2023 at 4:38 PM
Resolved March 21, 2023 at 5:48 AM