Python transform native mode is broken
Description
Release Notes
None
Attachments
1
- 15 May 2023, 05:32 PM
relates to
Activity
Show:
Albert ShauAugust 28, 2023 at 4:17 PM
Albert ShauMay 15, 2023 at 5:32 PM
Attaching pipeilne used to reproduce the issue. Dataproc has python installed by default, so no need for any init action to install it
Albert ShauMay 15, 2023 at 5:00 PMEdited
This particular exception can be fixed by changing the code ( ) to use getResourceAsStream() instead of getResource(), but I end up seeing what looks like a larger, more complicated class related issue:
2023-05-15 16:57:55,613 - ERROR [Executor task launch worker for task 0.0 in stage 0.0 (TID 0):o.a.s.u.Utils@94] - Aborting task
com.google.common.util.concurrent.ExecutionError: java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/spark/repl/ExecutorClassLoader) previously initiated loading for a different type with name "io/cdap/plugin/common/script/ScriptContext"
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2256)
at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4793)
at io.cdap.cdap.etl.spark.function.FunctionCache$EnabledCache.getValue(FunctionCache.java:122)
at io.cdap.cdap.etl.spark.function.PluginFunctionContext.createAndInitializePlugin(PluginFunctionContext.java:105)
at io.cdap.cdap.etl.spark.function.PluginFunctionContext.createAndInitializePlugin(PluginFunctionContext.java:114)
at io.cdap.cdap.etl.spark.function.TransformFunction.call(TransformFunction.java:47)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:135)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:134)
at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:505)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:508)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/spark/repl/ExecutorClassLoader) previously initiated loading for a different type with name "io/cdap/plugin/common/script/ScriptContext"
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.sun.proxy.$Proxy47.<clinit>(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739)
at py4j.Gateway.createProxy(Gateway.java:368)
at py4j.CallbackClient.getPythonServerEntryPoint(CallbackClient.java:418)
at py4j.GatewayServer.getPythonServerEntryPoint(GatewayServer.java:803)
at io.cdap.plugin.python.transform.Py4jPythonExecutor.initialize(Py4jPythonExecutor.java:199)
at io.cdap.plugin.python.transform.PythonEvaluator.initialize(PythonEvaluator.java:165)
at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$initialize$3(WrappedTransform.java:74)
Unresolved
Pinned fields
Click on the next to a field label to start pinning.
Details
Details
Assignee
Ankit Jain
Ankit JainReporter
Albert Shau
Albert ShauLabels
Fix versions
Priority
More fields
Original estimate
More fields
Original estimateCreated May 15, 2023 at 4:22 PM
Updated March 27, 2024 at 7:36 AM
To reproduce, run any pipeline that uses the python transform in ‘native’ mode. It will fail with:
Caused by: java.nio.file.NoSuchFileException: file:/hadoop/yarn/nm-local-dir/usercache/yarn/appcache/application_1683831012222_0004/container_1683831012222_0004_01_000003/data/tmp/1683832577274-0/1683832582815-0/%20artifact6375801250498470927.jar!/pythonEvaluator.py at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.Files.readAllBytes(Files.java:3152) at io.cdap.plugin.python.transform.Py4jPythonExecutor.prepareTempFiles(Py4jPythonExecutor.java:110) at io.cdap.plugin.python.transform.Py4jPythonExecutor.initialize(Py4jPythonExecutor.java:140) at io.cdap.plugin.python.transform.PythonEvaluator.initialize(PythonEvaluator.java:160) at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$initialize$3(WrappedTransform.java:72)
I’m not yet sure if this is requires a platform level fix, or if there is a potential plugin only fix.