BigQuery SCD Type Support
SCD Type 1 (Upsert)
SCD Type 2
SCDD Type 2 creates new dimensional records that retains the full history of values. When the value of a chosen attribute changes, the current record is closed. A new record is created with the changed data values and this new record becomes the current record. There are various ways to determine the current active records
Effective time and expiration time to identify the time period between which the record was active, and
Version, Last Update Time to identify the latest version and the last time the record was updated.
SC Type 2
MERGE INTO `<project>.<dataset>.dimensiontable`
USING (
SELECT `<project>.<dataset>.staging`.*
FROM `<project>.<dataset>.staging`
UNION ALL
SELECT `<project>.<dataset>.staging`.*
FROM `<project>.<dataset>.staging`
JOIN `<project>.<dataset>.dimensiontable`
ON `<project>.<dataset>.staging`.<key1> = `<project>.<dataset>.dimensiontable`.<key1>
WHERE (
`<project>.<dataset>.stg1`.<attribute1> <> `<project>.<dataset>.fact`.<attribute1> OR
`<project>.<dataset>.stg1`.<attribute2> <> `<project>.<dataset>.fact`.<attribute2> OR
`<project>.<dataset>.stg1`.<attribute3> <> `<project>.<dataset>.fact`.<attribute3> OR
`<project>.<dataset>.stg1`.<attribute4> <> `<project>.<dataset>.fact`.<attribute4>
)
) subQuery
ON (
sub.<key1> = `<project>.<dataset>.dimensiontable`.<key1> AND
sub.<key2> = `<project>.<dataset>.dimensiontable`.<key2> AND
sub.<key3> = `<project>.<dataset>.dimensiontable`.<key3>
)
WHEN MATCHED AND (
subQuery.<attribute1> <> `<project>.<dataset>.dimensiontable`.<key1> OR
subQuery.<attribute2> <> `<project>.<dataset>.dimensiontable`.<key2> OR
subQuery.<attribute3> <> `<project>.<dataset>.dimensiontable`.<key3> OR
subQuery.<attribute4> <> `<project>.<dataset>.dimensiontable`.<key4>
)
THEN UPDATE SET
lastupdt = CURRENT_DATETIME(),
version = version + 1
WHEN NOT MATCHED
THEN INSERT (
<key1>,
<key2>,
<key3>,
<attribute1>,
...
<attribute4>,
...
<attributen>,
version,
last_updt)
VALUES (
subQuery.<key1>,
subQuery.<key2>,
subQuery.<attribute1>,
...,
<attribute4>,
...
<attributen>,
1,
CURRENT_DATETIME()
);How to
Add SCD Type 2 Option : https://github.com/data-integrations/google-cloud/blob/develop/src/main/java/io/cdap/plugin/gcp/bigquery/sink/Operation.java#L9
Add Query Template : https://github.com/data-integrations/google-cloud/blob/develop/src/main/java/io/cdap/plugin/gcp/bigquery/sink/BigQueryOutputFormat.java#L97
Fill right values for templated query : https://github.com/data-integrations/google-cloud/blob/develop/src/main/java/io/cdap/plugin/gcp/bigquery/sink/BigQueryOutputFormat.java#L458
Add SCD Type 2 Option : https://github.com/data-integrations/google-cloud/blob/develop/widgets/BigQueryTable-batchsink.json#L69
Created in 2020 by Google Inc.