Union Splitter Transformation

Plugin version: 2.11.0

The Union Splitter is used to split data by a union schema, so that type specific logic can be done downstream.

The Union Splitter will emit records to different ports depending on the schema of a particular field, or of the entire record. If no field is specified, each record will be emitted to a port named after the name of the record schema. If a field is specified, the schema for that field must be a union of supported schemas. All schemas except maps, arrays, unions, and enums are supported. For each input record, the value of that field will be examined and emitted to a port corresponding to its schema in the union.

For record schemas, the output port will be the name of the record schema. For simple types, the output port will be the schema type in lowercase (‘null’, ‘bool’, ‘bytes’, ‘int’, ‘long’, ‘float’, ‘double’, or ‘string’).

Configuration

Property

Macro Enable?

Description

Property

Macro Enable?

Description

Union field to split on

No

Required. The union field to split on. The schema for the field must be a union of supported schemas. All schemas except maps, arrays, unions, and enums are supported. Note that nulls are supported, which means all nulls will get sent to the ‘null’ port.

Modify Schema

No

Optional.  Whether to modify the output schema to remove the union. For example, suppose the field ‘x’ is a union of int and long. If Modify Schema is true, the schema for field ‘x’ will be just an int for the ‘int’ port and just a long for the ‘long’ port. If Modify Schema is false, the output schema for each port will be the same as the input schema.

Default is true.

Output Schema

No

Required. The output schema for the data.

Example

Suppose the Union Splitter is configured to split on the ‘item’ field:

Property

Value

Property

Value

Union field to split on

item

Modify Schema

true

Suppose the Union Splitter receives records with schema:

name

type

name

type

id

long

user

string

item

[ int, long, itemMeta ]

with the ‘item’ field as a union of int, long and a record named ‘itemMeta’ with schema:

name

type

name

type

id

long

desc

string

This means the Union Splitter will have three output ports, one for each schema in the union.

If a record contains an integer for the ‘item’ field, it will be emitted to the ‘int’ port with output schema:

name

type

name

type

id

long

user

string

item

int

If a record contains a long for the ‘item’ field, it will be emitted to the ‘long’ port with output schema:

name

type

name

type

id

long

user

string

item

long

If a record contains a StructuredRecord with the itemMeta schema for the ‘item’ field, it will be emitted to the ‘itemMeta’ port with output schema:

name

type

name

type

id

long

user

string

item

itemMeta



Created in 2020 by Google Inc.