Explore queries in distributed mode cannot find guava-13 classes

Description

DatasetFramework now needs guava-13 classes while loading datasets. Explore classpath has HBase jars in the beginning that has guava-11 in its classpath.

Exception stack trace below -

2014-12-15 23:05:52,887 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IllegalAccessError: tried to access class com.google.common.hash.HashCodes from class co.cask.cdap.data2.datafabric.dataset.type.DistributedDatasetTypeClassLoaderFactory
at co.cask.cdap.data2.datafabric.dataset.type.DistributedDatasetTypeClassLoaderFactory.create(DistributedDatasetTypeClassLoaderFactory.java:112)
at co.cask.cdap.data2.datafabric.dataset.RemoteDatasetFramework.getDatasetType(RemoteDatasetFramework.java:274)
at co.cask.cdap.data2.datafabric.dataset.RemoteDatasetFramework.getDataset(RemoteDatasetFramework.java:181)
at co.cask.cdap.hive.datasets.DatasetAccessor.firstLoad(DatasetAccessor.java:207)
at co.cask.cdap.hive.datasets.DatasetAccessor.instantiate(DatasetAccessor.java:186)
at co.cask.cdap.hive.datasets.DatasetAccessor.instantiate(DatasetAccessor.java:157)
at co.cask.cdap.hive.datasets.DatasetAccessor.getRecordScannable(DatasetAccessor.java:56)
at co.cask.cdap.hive.datasets.DatasetInputFormat.getRecordReader(DatasetInputFormat.java:76)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:237)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:542)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Release Notes

None

Linked issues

relates to

CDAP-1040

Common Dependencies with Hive cause issues

Activity

Show:

Albert Shau December 17, 2014 at 12:06 AM

We will do a short term "fix" to get it working, and I will open another JIRA to track a longer term fix.

https://github.com/caskdata/cdap/pull/837

Albert Shau December 16, 2014 at 11:48 PM

looks like Spark shaded their guava dependency to work around this problem... (HIVE-7387)

Albert Shau December 16, 2014 at 11:47 PM

The problem is that when Hive launches a MR job, it creates a JobConf using the ExecDriver class, which means Hadoop will create a job.jar based on the jar that includes ExecDriver.class, which is the fat jar hive-exec.jar. It's a fat jar that includes guava, and that jar is the first jar included in the Hadoop classpath.

We need to figure out a real fix for this, but in the meantime we can try removing the problematic guava calls.

Poorna Chandra December 16, 2014 at 1:20 PM

Sorry for not including more details to reproduce. We are using CDH 5.1 and CDAP 2.6.0-SNAPSHOT. Any custom Dataset query in Explore causes the exception. Let me know if you need more details.

Albert Shau December 16, 2014 at 3:14 AM

hm we can add our guava version instead of hbase's, but this could still cause problems like what Terence mentioned, where somebody includes their own application specific guava. One fix at a time though, let's use guava from CDAP.