BigQuery Sink plugin fails to insert PostgreSQL Timestamp type with value '0001-01-01 01:00:00.000'

Description

BigQuery Sink plugin fails to insert PostgreSQL Timestamp type with value '0001-01-01 01:00:00.000' and the following exception is thrown in the sink node.

Sample json is attached.

2023-04-04 07:03:22,922 - ERROR [Driver:o.a.s.i.i.SparkHadoopWriter@94] - Aborting job job_202304040702228773670560368484570_0003. java.io.IOException: Failed to import GCS into BigQuery. at io.cdap.plugin.gcp.bigquery.sink.BigQueryOutputFormat$BigQueryOutputCommitter.commitJob(BigQueryOutputFormat.java:217) at io.cdap.cdap.etl.spark.io.TrackingOutputCommitter.commitJob(TrackingOutputCommitter.java:51) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:184) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:99) at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopDataset$1(PairRDDFunctions.scala:1077) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1075) at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:833) at io.cdap.cdap.etl.spark.batch.RDDUtils.saveHadoopDataset(RDDUtils.java:58) at io.cdap.cdap.etl.spark.batch.RDDUtils.saveUsingOutputFormat(RDDUtils.java:47) at io.cdap.cdap.etl.spark.batch.SparkBatchSinkFactory.writeFromRDD(SparkBatchSinkFactory.java:200) at io.cdap.cdap.etl.spark.batch.BaseRDDCollection$1.run(BaseRDDCollection.java:238) at io.cdap.cdap.etl.spark.SparkPipelineRunner.executeSinkRunnables(SparkPipelineRunner.java:210) at io.cdap.cdap.etl.spark.SparkPipelineRunner.processDag(SparkPipelineRunner.java:202) at io.cdap.cdap.etl.spark.SparkPipelineRunner.runPipeline(SparkPipelineRunner.java:183) at io.cdap.cdap.etl.spark.batch.BatchSparkPipelineDriver.run(BatchSparkPipelineDriver.java:260) at io.cdap.cdap.app.runtime.spark.SparkTransactional$2.run(SparkTransactional.java:236) at io.cdap.cdap.app.runtime.spark.SparkTransactional.execute(SparkTransactional.java:208) at io.cdap.cdap.app.runtime.spark.SparkTransactional.execute(SparkTransactional.java:138) at io.cdap.cdap.app.runtime.spark.AbstractSparkExecutionContext.execute(AbstractSparkExecutionContext.scala:231) at io.cdap.cdap.app.runtime.spark.SerializableSparkExecutionContext.execute(SerializableSparkExecutionContext.scala:63) at io.cdap.cdap.app.runtime.spark.DefaultJavaSparkExecutionContext.execute(DefaultJavaSparkExecutionContext.scala:94) at io.cdap.cdap.api.Transactionals.execute(Transactionals.java:63) at io.cdap.cdap.etl.spark.batch.BatchSparkPipelineDriver.run(BatchSparkPipelineDriver.java:189) at io.cdap.cdap.app.runtime.spark.SparkMainWrapper$.main(SparkMainWrapper.scala:88) at io.cdap.cdap.app.runtime.spark.SparkMainWrapper.main(SparkMainWrapper.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732) Caused by: java.io.IOException: Error occurred while importing data to BigQuery 'Error while reading data, error message: Invalid timestamp value '-62135766000000000' for field 'col1' of type 'long' File: bigstore/877f83ed-f148-430a-91ce-ce6768443f6b/877f83ed-f148-430a-91ce-ce6768443f6b/input/PostgreSQLTimestamp-877f83ed-f148-430a-91ce-ce6768443f6b/part-r-00000.avro'. There are total 1 error(s) for BigQuery job 8b5778c5-7f04-45ce-bf8b-fd4da296f9e2. Please look at BigQuery job logs for more information. at io.cdap.plugin.gcp.bigquery.sink.BigQueryOutputFormat$BigQueryOutputCommitter.waitForJobCompletion(BigQueryOutputFormat.java:520) at io.cdap.plugin.gcp.bigquery.sink.BigQueryOutputFormat$BigQueryOutputCommitter.triggerBigqueryJob(BigQueryOutputFormat.java:417) at io.cdap.plugin.gcp.bigquery.sink.BigQueryOutputFormat$BigQueryOutputCommitter.importFromGcs(BigQueryOutputFormat.java:382) at io.cdap.plugin.gcp.bigquery.sink.BigQueryOutputFormat$BigQueryOutputCommitter.commitJob(BigQueryOutputFormat.java:213) ... 33 common frames omitted

Release Notes

None

Attachments

1
  • 04 Apr 2023, 07:07 AM

Activity

Show:
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Affects versions

Priority

Created April 4, 2023 at 7:07 AM
Updated April 4, 2023 at 7:07 AM