Wrangler concepts

Wrangler uses the concepts of record, column, directive, recipe, transformation step, and data pipeline.

Record

A record is a collection of field names and field values.

In this documentation, a record is shown as a JSON object with an object key representing the column names and a value shown by the plain representation of the the data, without any mention of types.

For example:

{
  "id": 1,
  "fname": "root",
  "lname": "joltie",
  "address": {
    "housenumber": "678",
    "street": "Mars Street",
    "city": "Marcity",
    "state": "Maregon",
    "country": "Mari"
  },
  "gender": "M"
}

Column

A column is a group of field values of any of the supported data types. Each field value is part of one record.

Directive

A directive is a single data manipulation instruction, specified to either transform, filter, or pivot a single record into zero or more records. A directive can generate one or more steps to be executed by a data pipeline.

A directive can be represented in text in this format:

<command> <argument-1> <argument-2> ... <argument-n>

Recipe

A recipe is a set of directives. It consists of one or more directives. For example, the following recipe changes the data type of Fare to integer:

Transformation step

A transformation step is an implementation of a data transformation directive, operating on a single record or set of records. A transformation step can generate zero or more records from the application of a directive. Pipeline Studio applies the transformation steps in the order listed in the recipe.

Data pipeline

A data pipeline is a collection of stages to be applied on a record. The record(s) outputted from a stage are passed to the next stage in the pipeline.