Pivot Analytics
The Pivot Analytics plugin is available in the Hub.
The Pivot analytics plugin provides the ability to pivot data out-of-the-box, so that users can transpose, reorganize, group, and aggregate data.
This plugin can be used when user needs to transpose specific rows as columns, and generate aggregate results in those columns.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Pivot Columns | Yes | Required. Specifies a list of fields pivot as columns. Usually, these fields contain a fixed set of values, which are converted into columns. When a single field is selected, the number of columns added is equal to the number of unique values in the field multiplied by the number of aggregates. When multiple fields are selected, the number of columns is the product of unique values in each column multiplied by the number of aggregates. Maximum number of pivot columns allowed are 2. Allowed column types are: int, long, double, float, string, boolean. |
Pivot Rows | Yes | Required. Specifies a single field in the input schema. The unique values in this field will become the rows of the output from the Pivot transformation. |
Aggregates | Yes | Required. Specifies a set of aggregate functions to be applied to the input data for each group (pivot rows). A function has the syntax alias:function_name(column). Should specify at least 1 aggregate function. |
Number of Partitions | Yes | Optional. Number of partitions to use when aggregating. If not specified, the execution framework will decide how many to use. Default is |
Default Value | Yes | Optional. The default value to use in case a cell does not contain a value after pivoting. Default is |
Field Aliases | Yes | Optional. Sets up aliases for the fields generated by pivoting the data. By default, columns are of the form |
Error Handling: On Error | Yes | Optional. How to handle cases when not all unique values for a column are provided. Default is Skip error. |
Example
Consider the dataset below depicting sales (numbers) of specific products in every quarter for a sports goods store.
Quarter | Product | Brand | Sales |
---|---|---|---|
Q1 | Shoes | Nike | 50 |
Q1 | Shirts | Nike | 20 |
Q1 | Socks | Reebok | 40 |
Q1 | Shirts | Reebok | 60 |
Q1 | Shoes | Reebok | 50 |
Q2 | Shoes | Nike | 20 |
Q2 | Shoes | Reebok | 30 |
Q2 | Socks | Nike | 40 |
Q3 | Shoes | Nike | 50 |
Q3 | Shoes | Reebok | 30 |
Q3 | Socks | Reebok | 40 |
Q3 | Socks | Nike | 20 |
Q4 | Shoes | Reebok | 10 |
Q4 | Shirts | Reebok | 20 |
Q4 | Socks | Reebok | 30 |
Q4 | Shoes | Nike | 40 |
Q4 | Shirts | Nike | 50 |
Q4 | Socks | Nike | 60 |
With the configuration containing single column set as the following: Pivot Columns: Quarter=Q1,Q2,Q3,Q4
Pivot Row: Product
Aggregations: total:sum(Sales)
Default value: 0
The output data will provide the sum of sales by each product for each quarter:
Product | Q1_total | Q2_total | Q3_total | Q4_total |
---|---|---|---|---|
Shoes | 100 | 50 | 80 | 50 |
Socks | 40 | 40 | 60 | 90 |
Shirts | 80 | 0 | 0 | 70 |
With the configuration containing multiple columns set like following:
Pivot Columns: Quarter=Q1,Q2,Q3,Q4;Product=Shoes,Socks,Shirts
Pivot Rows: Brand
Aggregations: sum(Sales)
The output data will provide sum of sales by a brand for each quarter and product:
Brand | Shoes_Q1 | Shoes_Q2 | Shoes_Q3 | Shoes_Q4 | Socks_Q1 | Socks_Q2 | Socks_Q3 | Socks_Q4 | Shirts_Q1 | Shirts_Q2 | Shirts_Q3 | Shirts_Q4 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Nike | 50 | 20 | 50 | 40 | null | 40 | null | 60 | 20 | null | null | 50 |
Reebok | 50 | 30 | 30 | 10 | 40 | null | 40 | 30 | 60 | null | null | 20 |
Created in 2020 by Google Inc.