Wrangler concepts

Wrangler uses the concepts of record, column, directive, recipe, transformation step, and data pipeline.

Record

A record is a collection of field names and field values.

In this documentation, a record is shown as a JSON object with an object key representing the column names and a value shown by the plain representation of the the data, without any mention of types.

For example:

{ "id": 1, "fname": "root", "lname": "joltie", "address": { "housenumber": "678", "street": "Mars Street", "city": "Marcity", "state": "Maregon", "country": "Mari" }, "gender": "M" }

Column

A column is a group of field values of any of the supported data types. Each field value is part of one record.

Directive

A directive is a single data manipulation instruction, specified to either transform, filter, or pivot a single record into zero or more records. A directive can generate one or more steps to be executed by a data pipeline.

A directive can be represented in text in this format:

<command> <argument-1> <argument-2> ... <argument-n>

Recipe

A recipe is a set of directives. It consists of one or more directives. For example, the following recipe changes the data type of Fare to integer:

Transformation step

A transformation step is an implementation of a data transformation directive, operating on a single record or set of records. A transformation step can generate zero or more records from the application of a directive. Pipeline Studio applies the transformation steps in the order listed in the recipe.

Data pipeline

A data pipeline is a collection of stages to be applied on a record. The record(s) outputted from a stage are passed to the next stage in the pipeline.



Created in 2020 by Google Inc.