Joiner Analytics
Use the Joiner analytics to combine data from multiple inputs. Joins are based on equality. Supports inner
and outer
joins, selection and renaming of output fields. You can add a Joiner transformation at any stage in a data pipeline.
Because pipelines execute on either Spark or MapReduce, you don’t need to sort key columns before performing the join. As an added benefit, you can use the Wrangler to cleanse, blend, and transform the datasets before joining them.
The Joiner analytics is listed under Analytics in the plugin palette. Just click the Joiner to add it to a pipeline:
Join Fields
In the Fields area of the Joiner, you can perform the following tasks:
Remove fields from the join
Add aliases to duplicate field names
Removing fields from the join
You can remove unnecessary fields from the join. Just uncheck the box next to any field you don’t want to include in the join, and then click Get Schema to refresh the Output Schema. All unchecked fields will be dropped when you run the pipeline.
However, if your datasets are very large, removing fields in the Joiner can have a performance impact. For more information, see Optimizing Joiner Performance.
Adding aliases to duplicate field names
Field names in the output schema must be unique. If the input schemas have field names that are identical, you can either use the Wrangler to rename the fields or add aliases in the Joiner.
After you add aliases, click Get Schema to refresh the Output Schema.
Related content
Created in 2020 by Google Inc.