Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Plugin version: 4.9.0

Wrangler is an interactive tool that lets you perform transformations on a subset of your data. It allows you to apply directives and create recipes using UI or JEXL commands. This plugin applies data transformation directives on your data records. The directives are generated either through an interactive user interface or by manual entry into the plugin.

BigQuery ELT Pushdown for Preconditions (6.9.0+)

The Precondition step of a Wrangler stage in a pipeline is now eligible to execute in BigQuery when BigQuery ELT Transformation Pushdown is enabled in a pipeline. This is only supported when the Precondition Language is set to SQL.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Input field name

Yes

Required. The name of the input field (or * for all fields).

Default is * (asterisk).

Precondition Language

Yes

6.9.0/4.9.0

Required. This is a language selector for preconditions (JEXL/SQL).

Default is JEXL.

Precondition (JEXL)

Yes

6.9.0/4.9.0

Required. A JEXL filter to be applied before

a record is passed to data prep

the directives are executed.

Default is

false

False.

Directives (Recipe)

Yes

Required. The series of

data prep

directives to be applied on the input records.

User Defined Directives (UDD)

No

Optional. List of User Defined Directives (UDD) that must be loaded.

Failure Threshold

Error Handling

Yes

Required.

Maximum number of errors tolerated before exiting pipeline processing.Default is 1

Strategy to handle erroneous records.  

  • Skip error. Ignores records with errors. The pipeline proceeds when there is an error. 

  • Send to error port. Collects the erroneous records and sends them to the Error Collector. The pipeline proceeds and does not fail. See Sending records to error.

  • Fail pipeline. Fails the pipeline when the first error is encountered in transformation

For example, if there are string values in a column for certain rows where the directive, set-type :col_name integer is used, this results in an error.

Default is Fail pipeline.

Output Schema

Yes

Required. The output schema for the data.

Directives

There are numerous directives and variations supported by CDAP, which are documented at http://github.com/hydrator/wrangler. See Directives.

For information about working with decimals and BigDecimal types, see Working with Decimal types in Wrangler.

Usage Notes

All input record fields are made available to the data prep directives when when *  is is used as the field to be data preppedtransformed. They are in the record in the same order as they appear.Note that if the transform doesn’t operate on all of the input record fields or a field is not configured as part of the output schema, and you are using the set columns directive, you may see inconsistent behavior. Use the drop directive to drop any fields that are not used in the data prep

Precondition Language is set to JEXL by default. It can be switched between SQL and JEXL.

If Precondition Language is set to SQL, the Directive and UDD fields must be blank. If these fields have values, plugin validation fails. In addition, Wrangler doesn't support multiple input stages when the Precondition Language is set to SQL.

A precondition filter is useful to apply filtering on filter records before the records are delivered for data prepdirectives are applied to the records. To filter a record, specify a condition that will result in boolean a Boolean state of of true.

For example, to filter out all records that are a header record from a CSV file where the header record is at the start of the filehave a value of under 18 for an age field, you could use this filter:

Code Block
  offset == 0

This will filter out records that have an offset of zero.

This plugin uses the emiterror capability to emit records that fail parsing into a separate error stream, allowing the aggregation of all errors. However, if the Failure Threshold is reached, then the pipeline will fail.age < 18