BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer
If the input schema of bigquery sink contains integer field, the bigquery sink is not able to write the record to existing tables. The pipeline will fail with exception like:
To reproduce, create a pipeline that contains integer type field as input schema of bigquery sink, and try to write to bigquery sink. If the table does not exist, the pipeline will succeed in first run but fail in subsequent runs. If table exists, the pipeline will always fail.
Fixes the BQ sink plugin issue in 6.4 where pipelines with INT, FLOAT CDAP fields may fail and Decimal fields may write the wrong value. This could potentially cause existing pipelines to fail to deploy when upgraded from lower versions to 6.4.0. If these are validation errors, please fix the issues. If there are datatype mismatch issues, please correct the type or use relevant transformations to convert values to the correct type.
I’ve added this blurb to the 6.4.0 release notes under Known Issues:
PLUGIN-678: Data pipelines that include BigQuery sinks version 0.17.0 fail or give incorrect results. This is fixed in BigQuery sink version 0.17.1, which is available for download in the Hub.
Workaround: In the Hub, download Google Cloud Platform version 0.17.1. For each pipeline, replace BigQuery sink plugins version 0.17.0 with BigQuery sink plugins version 0.17.1.
Configured schema was overridden for BQSink in prepareRun for preventing unexpected schema changes in target table ( )
With date time changes (9bde7d2), the configured schema was used to create the AVRO schema and this resulted in this bug. This was done because configured schema could be a subset of the data schema and is consistent with the BigQueryJsonConverter behavior.
Moved the schema overriding to OutputCommitter only for the load job setting so that the configured schema remains the same and target table schema is not changed