Running CDAP pipelines in spark3 enabled dataproc cluster fails with class not found exception.

Description

Steps to reproduce,
1. Publish pipeline (database -> bq)
2. Create a dataproc profile with image configured to `preview-debian10`
3. Set the profile as default for the pipeline and run.

After the step3 the pipeline fails soon with following exception in the pipeline logs,

 

Release Notes

None

Activity

Show:
Terence Yim
February 17, 2021, 7:40 PM

Fix in

Terence Yim
January 21, 2021, 7:38 PM

There is a change in Spark 3.1 about inclusion of yarn.application.classpath. For now, the workaround is to set the spark.yarn.populateHadoopClasspath to true in the engine config.

Fixed

Assignee

Terence Yim

Reporter

Ajai Narayanan

Labels

None

Docs Impact

None

UX Impact

None

Components

Fix versions

Priority

Blocker