Joiner Analytics

Use the Joiner analytics to combine data from multiple inputs. Joins are based on equality. Supports inner and outer joins, selection and renaming of output fields. You can add a Joiner transformation at any stage in a data pipeline. 

Because pipelines execute on either Spark or MapReduce, you don’t need to sort key columns before performing the join. As an added benefit, you can use the Wrangler to cleanse, blend, and transform the datasets before joining them.

The Joiner analytics is listed under Analytics in the plugin palette. Just click the Joiner to add it to a pipeline:

Join Fields

In the Fields area of the Joiner, you can perform the following tasks:

  • Remove fields from the join 

  • Add aliases to duplicate field names

Removing fields from the join

You can remove unnecessary fields from the join. Just uncheck the box next to any field you don’t want to include in the join, and then click Get Schema to refresh the Output Schema. All unchecked fields will be dropped when you run the pipeline.

However, if your datasets are very large, removing fields in the Joiner can have a performance impact. For more information, see https://cdap.atlassian.net/wiki/spaces/DOCS/pages/382042959.

Adding aliases to duplicate field names

Field names in the output schema must be unique. If the input schemas have field names that are identical, you can either use the Wrangler to rename the fields or add aliases in the Joiner. 

After you add aliases, click Get Schema to refresh the Output Schema.

 

Created in 2020 by Google Inc.