Python transform native mode is broken

Description

To reproduce, run any pipeline that uses the python transform in ‘native’ mode. It will fail with:

 

Caused by: java.nio.file.NoSuchFileException: file:/hadoop/yarn/nm-local-dir/usercache/yarn/appcache/application_1683831012222_0004/container_1683831012222_0004_01_000003/data/tmp/1683832577274-0/1683832582815-0/%20artifact6375801250498470927.jar!/pythonEvaluator.py at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.Files.readAllBytes(Files.java:3152) at io.cdap.plugin.python.transform.Py4jPythonExecutor.prepareTempFiles(Py4jPythonExecutor.java:110) at io.cdap.plugin.python.transform.Py4jPythonExecutor.initialize(Py4jPythonExecutor.java:140) at io.cdap.plugin.python.transform.PythonEvaluator.initialize(PythonEvaluator.java:160) at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$initialize$3(WrappedTransform.java:72)

I’m not yet sure if this is requires a platform level fix, or if there is a potential plugin only fix.

Release Notes

None

Attachments

1
  • 15 May 2023, 05:32 PM

Activity

Show:

Albert ShauAugust 28, 2023 at 4:17 PM

Albert ShauMay 15, 2023 at 5:32 PM

Attaching pipeilne used to reproduce the issue. Dataproc has python installed by default, so no need for any init action to install it

Albert ShauMay 15, 2023 at 5:00 PM
Edited

This particular exception can be fixed by changing the code ( ) to use getResourceAsStream() instead of getResource(), but I end up seeing what looks like a larger, more complicated class related issue:

2023-05-15 16:57:55,613 - ERROR [Executor task launch worker for task 0.0 in stage 0.0 (TID 0):o.a.s.u.Utils@94] - Aborting task com.google.common.util.concurrent.ExecutionError: java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/spark/repl/ExecutorClassLoader) previously initiated loading for a different type with name "io/cdap/plugin/common/script/ScriptContext" at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2256) at com.google.common.cache.LocalCache.get(LocalCache.java:3990) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4793) at io.cdap.cdap.etl.spark.function.FunctionCache$EnabledCache.getValue(FunctionCache.java:122) at io.cdap.cdap.etl.spark.function.PluginFunctionContext.createAndInitializePlugin(PluginFunctionContext.java:105) at io.cdap.cdap.etl.spark.function.PluginFunctionContext.createAndInitializePlugin(PluginFunctionContext.java:114) at io.cdap.cdap.etl.spark.function.TransformFunction.call(TransformFunction.java:47) at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:135) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473) at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:134) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:505) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:508) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/spark/repl/ExecutorClassLoader) previously initiated loading for a different type with name "io/cdap/plugin/common/script/ScriptContext" at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at com.sun.proxy.$Proxy47.<clinit>(Unknown Source) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:739) at py4j.Gateway.createProxy(Gateway.java:368) at py4j.CallbackClient.getPythonServerEntryPoint(CallbackClient.java:418) at py4j.GatewayServer.getPythonServerEntryPoint(GatewayServer.java:803) at io.cdap.plugin.python.transform.Py4jPythonExecutor.initialize(Py4jPythonExecutor.java:199) at io.cdap.plugin.python.transform.PythonEvaluator.initialize(PythonEvaluator.java:165) at io.cdap.cdap.etl.common.plugin.WrappedTransform.lambda$initialize$3(WrappedTransform.java:74)
Unresolved
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Fix versions

Priority

More fields

Created May 15, 2023 at 4:22 PM
Updated March 27, 2024 at 7:36 AM