Running explore query on TPFS Parquet Dataset with Map as one of the fields throws an exception in the Explore service
Activity
Gokul Gunasekaran January 11, 2016 at 7:24 PM
The problem doesn't occur in HDP 2.3 ie., Hive Versions >= 1.2.1.
Gokul Gunasekaran January 11, 2016 at 7:19 PM
@Sreevatsan Raman I am not sure if HIVE-9605 fixed it (since it looks more like an optimization of existing code). Probably it was some other fix. I will see if I can find out the exact JIRA but I will resolve the issue.
Sreevatsan Raman January 10, 2016 at 9:28 PM
If this works with HDP 2.3 let's resolve this issue.
Note: The fix version in the Hive jira points to 1.3.0, however HDP 2.3 has Hive 1.2.1, it is possible that the fix is backported.
Gokul Gunasekaran January 10, 2016 at 4:23 AMEdited
Looks like this was a problem with HIVE and this is fixed in HIVE version 1.2. Might have been fixed earlier. At least, I didn't see this problem in HDP 2.3. @Sreevatsan Raman Can we resolve this issue? I will see if I can find any relevant JIRA in the HIVE JIRA system. Confirmed that this problem doesn't occur anymore with either TPFSAvro or TPFSParquet.
Gokul Gunasekaran January 8, 2016 at 9:11 PMEdited
This might have been fixed by this JIRA (https://issues.apache.org/jira/browse/HIVE-9605). Have to do a bit more research but this seems to be an Hive bug. Will test this on a cluster with Hive version > 1.2 if we any of the distro versions we support contains it.
To replicate, you can create a ETLBatch Adapter, Stream to TPFSParquet. Stream takes in say CSV format -> name (string), age (int). TPFSParquet, don't drop any fields (ts, headers, name, age). Create the adapter, ingest few events into Stream (ABC,23) and then start the adapter. Once a run has completed, then issue a SQL query -> select * from (datasetname). Try to view the results and you will see the following exception being thrown. If you retry with dropping the headers field, then there is no problem.
2015-08-03 13:43:55,921 - ERROR [executor-19:c.c.c.e.e.QueryExecutorHttpHandler@190] - Got exception: java.lang.RuntimeException: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.NullPointerException at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-13.0.1.jar:na] at co.cask.cdap.explore.service.hive.BaseHiveExploreService.fetchNextResults(BaseHiveExploreService.java:798) ~[classes/:na] at co.cask.cdap.explore.service.hive.BaseHiveExploreService.previewResults(BaseHiveExploreService.java:836) ~[classes/:na] at co.cask.cdap.explore.executor.QueryExecutorHttpHandler.getQueryResultPreview(QueryExecutorHttpHandler.java:173) ~[classes/:na] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_60] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_60] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_60] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_60] at co.cask.http.HttpMethodInfo.invoke(HttpMethodInfo.java:85) [netty-http-0.11.0.jar:na] at co.cask.http.HttpDispatcher.messageReceived(HttpDispatcher.java:41) [netty-http-0.11.0.jar:na] at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.6.6.Final.jar:na] at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.6.Final.jar:na] at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.6.6.Final.jar:na] at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) [netty-3.6.6.Final.jar:na] at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) [netty-3.6.6.Final.jar:na] at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314) [netty-3.6.6.Final.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60] Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.NullPointerException at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343) ~[hive-service-1.1.0.jar:1.1.0] at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250) ~[hive-service-1.1.0.jar:1.1.0] at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656) ~[hive-service-1.1.0.jar:1.1.0] at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451) ~[hive-service-1.1.0.jar:1.1.0] at co.cask.cdap.explore.service.hive.Hive14ExploreService.doFetchNextResults(Hive14ExploreService.java:69) ~[classes/:na] at co.cask.cdap.explore.service.hive.BaseHiveExploreService.fetchNextResults(BaseHiveExploreService.java:793) ~[classes/:na] ... 17 common frames omitted Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:338) ~[hive-service-1.1.0.jar:1.1.0] ... 22 common frames omitted Caused by: java.lang.NullPointerException: null at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:381) ~[parquet-hadoop-1.6.0rc3.jar:na] at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:367) ~[parquet-hadoop-1.6.0rc3.jar:na] at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:228) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:84) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:71) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:667) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323) ~[hive-exec-1.1.0.jar:1.1.0] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445) ~[hive-exec-1.1.0.jar:1.1.0] ... 26 common frames omitted