Wrangler cheatsheet

Name

Usage

Description

Name

Usage

Description

SWAP

swap <column1> <column2>

Swaps the column names of two columns.

ENCODE

encode <base32

base64

XPATH

xpath <column> <destination> <xpath>

Extract a single XML element or attribute using XPath.

GENERATE-UUID

generate-uuid <column>

Populates a column with a universally unique identifier (UUID) of the record.

LOWERCASE

lowercase <column>

Changes the column values to lowercase.

WRITE-AS-CSV

write-as-csv <column>

Writes the records files as well-formatted CSV

PARSE-AS-PROTOBUF

parse-as-protobuf <column> <schema-id> <record-name> [version]

Parses column as protobuf encoded memory representations.

HASH

hash <column> <algorithm> [<encode=true

false>]

JSON-PATH

json-path <source> <destination> <json-path-expression>

Parses JSON elements using a DSL (a JSON path expression).

MASK-NUMBER

mask-number <column> <pattern>

Masks a column value using the specified masking pattern.

TEXT-DISTANCE

text-distance <method> <column1> <column2> <destination>

Calculates a text distance measure between two columns containing string.

PARSE-XML-TO-JSON

parse-xml-to-json <column> [<depth>]

Parses a XML document to JSON representation.

PARSE-AS-HL7

parse-as-hl7 <column> [<depth>]

Parses <column> for Health Level 7 Version 2 (HL7 V2) messages; <depth> indicates at which point JSON object enumeration terminates.

FIND-AND-REPLACE

find-and-replace <column> <sed-expression>

Finds and replaces text in column values using a sed-format expression.

RENAME

rename <old> <new>

Renames an existing column.

PARSE-AS-AVRO

parse-as-avro <column> <schema-id> <json

binary> [version]

FILL-NULL-OR-EMPTY

fill-null-or-empty <column> <fixed-value>

Fills a value of a column with a fixed value if it is either null or empty.

SET-TYPE

set-type <column> <type>

Converting data type of a column.

RTRIM

rtrim <column>

Trimming whitespace from right side of a string.

INVOKE-HTTP

invoke-http <url> <column>[,<column>] <header>[,<header>]

[EXPERIMENTAL] Invokes an HTTP endpoint, passing columns as a JSON map (potentially slow).

COLUMNS-REPLACE

columns-replace <sed-expression>

Modifies column names in bulk using a sed-format expression.

SEND-TO-ERROR

send-to-error <condition>

Send records that match condition to the error collector.

SET-RECORD-DELIM

set-record-delim <column> <delimiter> [<limit>]

Sets the record delimiter.

SET-VARIABLE

set-variable <variable> <expression>

Sets the value for a transient variable for the record being processed.

SET-CHARSET

set-charset <column> <charset>

Sets the character set decoding to UTF-8.

WRITE-AS-JSON-OBJECT

write-as-json-object <dest-column> [<src-column>[,<src-column>]

Creates a JSON object based on source columns specified. JSON object is written into dest-column.

KEEP

keep <column>[,<column>*]

Keeps the specified columns and drops all others.

CUT-CHARACTER

cut-character <source> <destination> <type> <range

indexes>

SPLIT-TO-ROWS

split-to-rows <column> <separator>

Splits a column into multiple rows, copies the rest of the columns.

XPATH-ARRAY

xpath-array <column> <destination> <xpath>

Extract XML element or attributes as JSON array using XPath.

FAIL

fail <condition>

Fails when the condition is evaluated to true.

INCREMENT-VARIABLE

increment-variable <variable> <value> <expression>

Wrangler - A interactive tool for data cleansing and transformation.

PARSE-AS-XML

parse-as-xml <column>

Parses a column as XML.

PARSE-AS-FIXED-LENGTH

parse-as-fixed-length <column> <width>[,<width>*] [<padding-character>]

Parses fixed-length records using the specified widths and padding-character.

CHANGE-COLUMN-CASE

change-column-case lower

upper

SPLIT-EMAIL

split-email <column>

Split a email into account and domain.

URL-ENCODE

url-encode <column>

URL encode a column value.

WRITE-AS-JSON-MAP

write-as-json-map <column>

Writes all record columns as JSON map.

MASK-SHUFFLE

mask-shuffle <column>

Masks a column value by shuffling characters while maintaining the same length.

DROP

drop <column>[,<column>*]

Drop one or more columns.

DECODE

decode <base32

base64

SPLIT

split <source> <delimiter> <new-column-1> <new-column-2>

[DEPRECATED] Use 'split-to-columns' or 'split-to-rows'.

PARSE-AS-SIMPLE-DATE

parse-as-simple-date <column> <format>

Parses a column as date using format.

DIFF-DATE

diff-date <column1> <column2> <destination>

Calculates the difference in milliseconds between two Date objects.Positive if <column2> earlier. Must use 'parse-as-date' or 'parse-as-simple-date' first.

INDEXSPLIT

indexsplit <source> <start> <end> <destination>

[DEPRECATED] Use the 'split-to-columns' or 'parse-as-fixed-length' directives instead.

PARSE-AS-AVRO-FILE

parse-as-avro-file <column>

parse-as-avro-file <column>.

FILTER-ROW-IF-TRUE

filter-row-if-true <condition>

[DEPRECATED] Filters rows if condition is evaluated to true. Use 'filter-rows-on' instead.

SPLIT-URL

split-url <column>

Split a url into it's components host,protocol,port,etc.

FORMAT-DATE

format-date <column> <format>

Formats a column using a date-time format. Use 'parse-as-date` beforehand.

QUANTIZE

quantize <source> <destination> <[range1:range2)=value>,[<range1:range2=value>]*

Quanitize the range of numbers into label values.

PARSE-AS-EXCEL

parse-as-excel <column> [<sheet number

sheet name>]

PARSE-AS-DATE

parse-as-date <column> [<timezone>]

Parses column values as dates using natural language processing and automatically identifying the format (expensive in terms of time consumed).

TABLE-LOOKUP

table-lookup <column> <table>

Uses the given column as a key to perform a lookup into the specified table.

FILTER-ROWS-ON

filter-rows-on empty-or-null-columns <column>[,<column>*]

Filters row that have empty or null columns.

TRIM

trim <column>

Trimming whitespace from both sides of a string.

URL-DECODE

url-decode <column>

URL decode a column value.

FLATTEN

flatten <column>[,<column>*]

Separates array elements of one or more columns into indvidual records, copying the other columns.

UPPERCASE

uppercase <column>

Changes the column values to uppercase.

CATALOG-LOOKUP

catalog-lookup <catalog> <column>

Looks-up values from pre-loaded (static) catalogs.

PARSE-AS-LOG

parse-as-log <column> <format>

Parses Apache HTTPD and NGINX logs.

LTRIM

ltrim <column>

Trimming whitespace from left side of a string.

EXTRACT-REGEX-GROUPS

extract-regex-groups <column> <regex-with-groups>

Extracts data from a regex group into its own column.

PARSE-AS-CSV

parse-as-csv <column> <delimiter> [<header=true

false>]

FILTER-ROW-IF-MATCHED

filter-row-if-matched <column> <regex>

[DEPRECATED] Filters rows if the regex is matched. Use 'filter-rows-on' instead.

PARSE-AS-JSON

parse-as-json <column> [<depth>]

Parses a column as JSON.

SET COLUMN

set column <column> <jexl-expression>

Sets a column by evaluating a JEXL expression.

STEMMING

stemming <column>

Apply Porter Stemming on the column value.

COPY

copy <source> <destination> [<force=true

false>]

SET-COLUMN

set-column <column> <expression>

Sets a column the result of expression execution.

SPLIT-TO-COLUMNS

split-to-columns <column> <regex>

Splits a column into one or more columns around matches of the specified regular expression.

CLEANSE-COLUMN-NAME

cleanse-column-names

Sanatizes column names: trims, lowercases, and replaces all but [A-Z][a-z][0-9].with an underscore ''.

SET COLUMNS

set columns <columm>[,<column>*]

Sets the name of columns, in the order they are specified.

TITLECASE

titlecase <column>

Changes the column values to title case.

MERGE

merge <column1> <column2> <new-column> <separator>

Merges values from two columns using a separator into a new column.

TEXT-METRIC

text-metric <method> <column1> <column2> <destination>

Calculates the metric for comparing two string values.

SET FORMAT

set format csv <delimiter> <skip empty lines>

[DEPRECATED] Parses the predefined column as CSV. Use 'parse-as-csv' instead.

FORMAT-UNIX-TIMESTAMP

format-unix-timestamp <column> <format>

Formats a UNIX timestamp using the specified format

FILTER-ROW-IF-NOT-MATCHED

filter-row-if-not-matched <column> <regex>

Filters rows if the regex does not match

FILTER-ROW-IF-FALSE

filter-row-if-false <condition>

Filters rows if the condition evaluates to false

Created in 2020 by Google Inc.