Sending records to error

There are three types of errors that can occur:

  • Systemic

  • Logical

  • Data

Systemic errors include errors such as a service failing or the instance failing. Logical errors include errors such as a pipeline run failing. Data errors include errors such as invalid credit card numbers, invalid date formats, and invalid zip codes.

Wrangler provides a set of over 50 functions to help you remove common errors from a dataset.

To send records to error, follow these steps:

  1. Click the drop-down button next to the column name.

  2. Click Send to error, and then select the the condition to send bad records to error.

Wrangler removes values that match the specified condition from the sample and adds the send to error directive to the recipe. When you run the data pipeline, the transformation is applied to all values in the column.

Adding an Error Collector plugin to a data pipeline

When you add a Wrangler transformation with a recipe that includes the send to error directive to a data pipeline, you can choose to connect it to an Error Collector plugin. Error Collector plugins are usually connected to a downstream sink plugin, such as GCS.

When you run a pipeline with an Error Collector, the records flagged with send to error flow from the Wrangler to the Error Collector and then to a sink plugin. After the pipeline run completes, you can examine the bad records written to the sink to better understand the problems with the data.

If your recipe includes send to error transformation steps, but the pipeline doesn’t include an Error Collector, the records flagged as send to error are dropped during the pipeline run.

 

 

Created in 2020 by Google Inc.