Flatten directive

The FLATTEN directive separates the elements in a repeated field into individual records. Use the FLATTEN directive for arrays. To flatten a record column with nested value, use the FLATTEN-RECORD directive.

Note: In releases before CDAP 6.7.0, due to a bug, the FLATTEN directive worked with both an array and a record column with nested values. Starting in CDAP 6.7.0, the FLATTEN directive only works with arrays.

Syntax

flatten :column [,:column]*

The column is the name of a column that is a JSON array.

Usage Notes

The FLATTEN directive is useful for the flexible exploration of repeated data.

To maintain the association between each flattened value and the other fields in the record, the FLATTEN directive copies all of the other columns into each new record.

Examples

Case 1

The array in col2 is flattened and the values in col3 are repeated for each value of col2:

Input Record

[ { "col1": "A" }, { "col1": "B" }, { "col2": [x1, y1], "col3": 10 }, { "col2": [x2, y2], "col3": 11 } ]

Output Record

[ { "col1": "A" }, { "col1": "B" }, { "col2": "x1", "col3": 10 }, { "col2": "y1", "col3": 10 }, { "col2": "x2", "col3": 11 }, { "col2": "y2", "col3": 11 } ]

Case 2

The arrays in col2 and col3 are flattened:

Input Record

Output Record

Case 3

The arrays in col2 and col3 are flattened:

Input Record

Output Record

Case 4

Using this record as an example:

The directive would result in these three distinct records:

x

y

z

x

y

z

5

"a string"

1

5

"a string"

2

5

"a string"

3

The directive takes a single argument, which must be an array (the z column in this example). In this case, using the "all" (*) wildcard as the argument to flatten is not supported and would return an error.

Created in 2020 by Google Inc.