Split to Rows directive

The SPLIT-TO-ROWS directive splits a column based on a separator into multiple records, making a copy of the record for each column value generated by splitting.

Syntax

split-to-rows :column 'regex'

The column is split based on the <separator>, which can be defined as a regular expression (regex).

Usage Notes

The SPLIT-TO-ROW directive takes a column, applies the regular expression separator, and then creates multiple rows from the split.

Regular expressions allows the use of complex search patterns when splitting the data in the column. It supports standard Java regular expression constructs.

If the <separator> regular expression pattern does not match any part of the input string, then no additional rows are generated.

Where there is a match, and the match splits the input string into n strings, then n rows created by copying other columns will be generated.

Note: This directive can only operate on columns of type string.

Examples

If we have a <separator> pattern of \n (a linefeed) over the string:

This will be split 1\nThis will be split 2

This will generate two output records with the column having these values:

[ { "column": "This will be split 1" }, { "column": "This will be split 2" } ]

Using this record as an example:

{ "id": 1, "codes": "USD|AUD|AMD|XCD", }

Applying this directive:

Note: A backslash is required to escape the pipe character (|) as it is an optional separator in a regex pattern.

This would result in four records being generated, with each split value being assigned to the column codes:

 

Created in 2020 by Google Inc.