replication retry of LOAD in STAGING table may give duplicates

Description

In replication we first write to staging table then merge the staging into main table.

The scenario :

  • There was a write to staging table with big query job id : 1700645782013_0

  • According to big query this was successful

  • but in CDF code , the JOB DONE status was not received because of network issue. and CDF assumed it as failed. com.google.cloud.bigquery.BigQueryException: Read timed out`

  • CDF retried and retriggered the job with id 1700645782013_1

  • and we write with writeDisposition: "WRITE_APPEND"

Resulting in duplicate records in staging table.

Which on Merging with main table gave the error UPDATE/MERGE must match at most one source row for each target row

We should gracefully handle such retries
- may be check staging table if it exists and if exists how many records.

Error stack trace :

Release Notes

None

Activity

Show:

Sumit JainDecember 13, 2023 at 1:41 PM

This issue should already be fixed by in 6.9.0

Duplicate
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Triaged

No

Size

M

Components

Priority

Created November 30, 2023 at 10:56 AM
Updated December 14, 2023 at 11:35 AM
Resolved December 14, 2023 at 10:57 AM