Distinct Analytics

Plugin version: 2.11.0

De-duplicates input records so that all output records are distinct. Can optionally take a list of fields, which will project out all other fields and perform a distinct on just those fields.

This plugin is used when you want to ensure that all output records are unique.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Fields

Yes

Optional. Comma-separated list of fields to perform the distinct on. If not given, all fields are used.

Number of Partitions

Yes

Optional. Number of partitions to use when grouping fields. If not specified, the execution framework will decide on the number to use.

Output Schema

Yes

Required. The output schema for the data.

Example

Property

Value

Property

Value

Fields

user,item,action

This example takes the useraction, and item fields from input records and dedupes them so that every output record is a unique record with those three fields. For example, if the input to the plugin is:

user

item

action

timestamp

user

item

action

timestamp

bob

donut

buy

1000

bob

donut

buy

1000

bob

donut

buy

1001

bob

coffee

buy

1001

bob

coffee

drink

1010

bob

donut

eat

1050

bob

donut

eat

1080

then records output will be:

user

item

action

user

item

action

bob

donut

buy

bob

coffee

buy

bob

coffee

drink

bob

donut

eat



Created in 2020 by Google Inc.