Pivot Analytics

The Pivot Analytics plugin is available in the Hub.

The Pivot analytics plugin provides the ability to pivot data out-of-the-box, so that users can transpose, reorganize, group, and aggregate data.

This plugin can be used when user needs to transpose specific rows as columns, and generate aggregate results in those columns.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Pivot Columns

Yes

Required. Specifies a list of fields pivot as columns. Usually, these fields contain a fixed set of values, which are converted into columns. When a single field is selected, the number of columns added is equal to the number of unique values in the field multiplied by the number of aggregates. When multiple fields are selected, the number of columns is the product of unique values in each column multiplied by the number of aggregates. Maximum number of pivot columns allowed are 2. Allowed column types are: int, long, double, float, string, boolean.

Pivot Rows

Yes

Required. Specifies a single field in the input schema. The unique values in this field will become the rows of the output from the Pivot transformation.

Aggregates

Yes

Required. Specifies a set of aggregate functions to be applied to the input data for each group (pivot rows). A function has the syntax alias:function_name(column). Should specify at least 1 aggregate function.

Number of Partitions

Yes

Optional. Number of partitions to use when aggregating. If not specified, the execution framework will decide how many to use.

Default is 1.

Default Value

Yes

Optional. The default value to use in case a cell does not contain a value after pivoting.

Default is null.

Field Aliases

Yes

Optional. Sets up aliases for the fields generated by pivoting the data. By default, columns are of the form <unique_value_in_pivot_column>_<aggregation_alias>.

Error Handling: On Error

Yes

Optional. How to handle cases when not all unique values for a column are provided.

Default is Skip error.

Example

Consider the dataset below depicting sales (numbers) of specific products in every quarter for a sports goods store.

Quarter

Product

Brand

Sales

Quarter

Product

Brand

Sales

Q1

Shoes

Nike

50

Q1

Shirts

Nike

20

Q1

Socks

Reebok

40

Q1

Shirts

Reebok

60

Q1

Shoes

Reebok

50

Q2

Shoes

Nike

20

Q2

Shoes

Reebok

30

Q2

Socks

Nike

40

Q3

Shoes

Nike

50

Q3

Shoes

Reebok

30

Q3

Socks

Reebok

40

Q3

Socks

Nike

20

Q4

Shoes

Reebok

10

Q4

Shirts

Reebok

20

Q4

Socks

Reebok

30

Q4

Shoes

Nike

40

Q4

Shirts

Nike

50

Q4

Socks

Nike

60

With the configuration containing single column set as the following: Pivot Columns: Quarter=Q1,Q2,Q3,Q4 Pivot Row: Product Aggregations: total:sum(Sales) Default value: 0

The output data will provide the sum of sales by each product for each quarter:

Product

Q1_total

Q2_total

Q3_total

Q4_total

Product

Q1_total

Q2_total

Q3_total

Q4_total

Shoes

100

50

80

50

Socks

40

40

60

90

Shirts

80

0

0

70

With the configuration containing multiple columns set like following:

Pivot Columns: Quarter=Q1,Q2,Q3,Q4;Product=Shoes,Socks,Shirts Pivot Rows: Brand Aggregations: sum(Sales)

The output data will provide sum of sales by a brand for each quarter and product:

Brand

Shoes_Q1

Shoes_Q2

Shoes_Q3

Shoes_Q4

Socks_Q1

Socks_Q2

Socks_Q3

Socks_Q4

Shirts_Q1

Shirts_Q2

Shirts_Q3

Shirts_Q4

Brand

Shoes_Q1

Shoes_Q2

Shoes_Q3

Shoes_Q4

Socks_Q1

Socks_Q2

Socks_Q3

Socks_Q4

Shirts_Q1

Shirts_Q2

Shirts_Q3

Shirts_Q4

Nike

50

20

50

40

null

40

null

60

20

null

null

50

Reebok

50

30

30

10

40

null

40

30

60

null

null

20

Created in 2020 by Google Inc.