Join Types

Required input stages determine the type of the join. The Joiner supports the following types of joins:

  • Inner

  • Outer

To show an example of each type of join, we’ll join records from the Customer and Purchase source datasets. 

We’ll set the join condition to customer_id and customer_id (with alias set to customer_id2 for the Purchase key). 

The Customer input includes the following records:

customer_id

state

1

CA

2

TX

3

NY

4

FL

5

CA

6

NV

The Purchase input includes the following records:

customer_id2

item

price

1

donut

0.80

1

coffee

2.05

2

donut

1.50

2

plate

0.50

3

tea

1.99

5

cookie

0.50

Inner Join

Inner join outputs fields that have matching values in both input datasets.

Continuing with our example, when the Joiner performs an inner join on the sample datasets, Customer and Purchase, it results in the following output:

customer _id

state

item

price

1

CA

donut

.80

1

CA

coffee

2.05

2

TX

donut

1.50

2

TX

plate

.50

3

NY

tea

1.99

5

CA

cookie

.50

Outer Join

Outer join outputs fields from the required inputs and matching fields from the non-required inputs. To configure an outer join, select the required inputs for the join. For example, in the following Joiner, Customer is the required input, which means the output will include all records in the Customer input and matching records in the Purchase input. 



Continuing with our example, when the Joiner performs an outer join with Customer set as the required input, it results in the following output:

customer _id

state

item

price

1

CA

donut

.80

1

CA

coffee

2.05

2

TX

donut

1.50

2

TX

plate

.50

3

NY

tea

1.99

4

FL

 

 

5

CA

cookie

.50

6

NV

 

 

Likewise, if you select Purchase as the required input, the output includes all records in the Purchase input and matching records from the Customer input.

For outer joins, if you join multiple inputs, you can select multiple required inputs. The join behavior is the same with one required input or many. The required output includes all records from the required inputs and matching records for the non-required inputs.

Created in 2020 by Google Inc.