Preview fails consistently with weird file not found

Description

On the latest 6.2.0 build, I'm seeing consistent errors when trying to preview a very simple pipeline.

I have attached the pipeline and the log file

Release Notes

Fixed a bug where concurrent preview runs were failing because SparkConf for the new preview runs was getting populated with the configurations from the previously started in-progress preview run.

Attachments

2

Activity

Show:

Sagar Kapare May 13, 2020 at 5:41 PM

Verified the fix on the latest 6.2 build with steps mentioned in the above comment with simple GCS-GCS pipeline. Same pipeline was used to reproduce the issue as well. Closing this bug. 

Sagar Kapare May 6, 2020 at 11:24 PM

Validated fix by running concurrent preview runs.

Without the fix issue used to reproduce with 2-3 concurrent preview runs always. Also from the preview logs, we could see that, `spark.jars` configuration parameter containing cdapSparkJob.jar corresponding to the other preview runs was passed to the spark-submit.

After the fix there is no "spark.jars" argument passed to the spark-submit and concurrent previews are succeeding without any error. 

Sagar Kapare May 6, 2020 at 11:06 PM

Sagar Kapare May 4, 2020 at 7:28 PM

Discussed offline with

We do rewrite of the SparkSubmit call to not to write to the System.setProperty [1]. However in Spark 2.3, System.setProperty call is removed by the PR [2]. Instead, the set property is now done through scala sys.props in JavaMainApplicationClass [3]. We will need to rewrite that class as well to fix this issue.

 [1] https://github.com/cdapio/cdap/blob/develop/cdap-spark-core-base/src/main/java/io/cdap/cdap/app/runtime/spark/classloader/SparkClassRewriter.java#L159

[2] https://github.com/apache/spark/pull/19519

[3] https://github.com/apache/spark/blob/v2.3.0/core/src/main/scala/org/apache/spark/deploy/SparkApplication.scala#L49

Sagar Kapare May 4, 2020 at 12:42 PM

Following is the root cause for the issue:

When spark program is submitted, command line arguments passed to the spark-submit via --conf flag that are  prefixed with "spark.*" are converted into system properties by spark-submit. 

CDAP passes several such properties for example: spark.submit.deployMode, spark.executor.id, spark.app.id, spark.executor.cores spark.extraListeners, spark.app.name, spark.jars etc.

These properties are explicitly removed from system properties by the cleanup task once the spark job is completed[1].  

When a new run of a spark pipeline is submitted, SparkConf object is created in the initialize method of the ETLSpark[2]. Spark by default loads the system properties into the returned spark conf[3]. If this SparkConf is created before the previous run of the preview is cleaned up, resulting SparkConf will contain properties from the older run. This incorrect SparkConf then overrides the values for the current preview run assuming that these were user supplied [4] . 

This results into the new preview run to be executed with the configurations from the older preview run. As a result it tries to access the files created for older run and fails.

[1]  https://github.com/cdapio/cdap/blob/develop/cdap-spark-core-base/src/main/java/io/cdap/cdap/app/runtime/spark/SparkRuntimeService.java#L850

[2] https://github.com/cdapio/cdap/blob/develop/cdap-app-templates/cdap-etl/hydrator-spark-core-base/src/main/java/io/cdap/cdap/etl/spark/batch/ETLSpark.java#L96

[3]  https://github.com/apache/spark/blob/v2.3.3/core/src/main/scala/org/apache/spark/SparkConf.scala#L73

[4] https://github.com/cdapio/cdap/blob/558acff85a2dd64f9961d983f6fb337657fcf6c2/cdap-spark-core-base/src/main/java/io/cdap/cdap/app/runtime/spark/SparkRuntimeService.java#L523

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Fix versions

Priority

Created April 30, 2020 at 8:32 PM
Updated December 9, 2020 at 8:59 PM
Resolved May 6, 2020 at 11:25 PM