BigQuery sink should use actual table's partitioning type

Description

To reproduce, configure a pipeline to write to a table without any partitioning, except set the sink to use time partitioning type and a partition field.

The pipeline will fail with a runtime error at the end, when it tries to load from gcs to BigQuery.

Instead, since the partitioning type is in the 'auto create' section, it should really only be used when auto-creating the table. Otherwise, the partitioning should be based on the existing table's actual partitioning information. This would also make it work properly when the sink is not configured with any partitioning information, but the existing table happens to already be partitioned.

 

This type of behavior is exacerbated by the fact that partitioning type defaults to time when there is no value given. So it is possible for an old pipeline to set a partition field, have it be ignored because the table is not partitioned, upgrade the pipeline, then start to see failures.

Release Notes

None
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Bhooshan Mogal

Reporter

Albert Shau