Duplicate
Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
UnassignedUnassignedReporter
Sanket SahuSanket SahuTriaged
NoSize
MComponents
Priority
Major
Details
Details
Assignee
Unassigned
UnassignedReporter
Sanket Sahu
Sanket SahuTriaged
No
Size
M
Components
Priority
Created November 30, 2023 at 10:56 AM
Updated December 14, 2023 at 11:35 AM
Resolved December 14, 2023 at 10:57 AM
In replication we first write to staging table then merge the staging into main table.
The scenario :
There was a write to staging table with big query job id :
1700645782013_0
According to big query this was successful
but in CDF code , the JOB DONE status was not received because of network issue. and CDF assumed it as failed.
com.google.cloud.bigquery.BigQueryException: Read timed out
`CDF retried and retriggered the job with id
1700645782013_1
and we write with
writeDisposition: "WRITE_APPEND"
Resulting in duplicate records in staging table.
Which on Merging with main table gave the error
UPDATE/MERGE must match at most one source row for each target row
We should gracefully handle such retries
- may be check staging table if it exists and if exists how many records.
Error stack trace :