Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

The Repartitioner analytics plugin is available in the Hub.

Converts raw data into denormalized data based on a key column. User is able to specify the list of fields that should be used in the denormalized record, with an option to use an alias for the output field name. For example, 'ADDRESS' in the input is mapped to 'addr' in the output schema.

The transform takes input record that stores a variable set of custom attributes for an entity, denormalizes it on the basis of the key field, and then returns a denormalized table according to the output schema specified by the user. The denormalized data is easier to queryThis plugins re-partitions a Spark RDD.

Configuration

Property

Macro Enabled?

Description

Partitions

Yes

Required. Number of partitions to use when grouping data. If not specified, the execution framework will decide on the number to use.

Default is 1.

Shuffle Data

Yes

Required. Specifies whether the records have to be shuffled.

Default is false.