Scala compute plugin with spark2 fails to run on dataproc

Description

Release Notes

Fixed a bug that caused the Dynamic Spark plugins to fail when running on Dataproc 1.5

Activity

Show:

Albert Shau July 31, 2023 at 9:51 PM

re-opening since it hasn't been cherry-picked yet

Albert Shau July 26, 2023 at 12:14 AM

Albert Shau July 19, 2023 at 10:03 PM

Removing scala as a direct dependency doesn’t prevent scala-library from getting pulled into the twill.jar, likely because it gets traced from the Kafka dependencies anyway. Filtering it out of the libraries within DataprocJobMain does fix this issue, but need to see if it makes more sense to filter it out in the submission part rather than later in the launcher.

Albert Shau July 19, 2023 at 7:12 PM

scala was added as a direct dependency in . Should not have added scala-library to a base class like this, it can cause all sorts of issues.

Albert Shau July 19, 2023 at 7:04 PM

I think this is because the twill.jar we create and ship to DataprocJobMain contains some scala libraries:

scala-library-2.12 in particular is the one with scala.collection.immutable.List, which is incompatible with scala 2.11 on the cluster.

I’m not sure why scala is getting bundled into the twill jar, will see if we can exclude it

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Triaged

Yes

Components

Fix versions

Due date

Priority

Created June 1, 2023 at 7:36 PM
Updated July 31, 2023 at 11:24 PM
Resolved July 31, 2023 at 11:24 PM