Sampling Aggregate Analytics

The Sampling Aggregate analytics plugin is available in the Hub.

Sampling a large dataset flowing through this plugin to pull random records. Supports two types of sampling, for example, Systematic Sampling and Reservoir Sampling.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Sample Size

Yes

Optional. The number of records that needs to be sampled from the input records. Either of 'samplePercentage' or 'sampleSize' should be specified for this plugin.

Sample Percentage

Yes

Optional. The percentage of records that needs to be sampled from the input records. Either of 'samplePercentage' or 'sampleSize' should be specified for this plugin.

Sampling Type

No

Required. Type of the Sampling algorithm that should to be used to sample the data. This can be either Systematic or Reservoir.

Default is Systematic.

Over Sampling Percentage

Yes

Optional. The percentage of additional records that should be included in addition to the input sample size to account for oversampling. Required for Systematic Sampling.

Random

Yes

Optional. Random float value between 0 and 1 to be used in Systematic Sampling. If not provided, plugin will internally generate random value.

Total Records

Yes

Optional. Total number of input records for Systematic Sampling.

Example

This example read data from some stream and sort them alphabetically using a OrderBy plugin and uses Systematic Sampling to sample the input records considering the sample size and oversampling percentage mentioned in the inputs below:

Property

Value

Property

Value

Sample Size

3

Sampling Type

Systematic

Over Sampling Percentage

20

Total Records

10

If the aggregator receives as an input record:

id

name

salary

occupation

id

name

salary

occupation

1

John

1000

Artist

2

Kelly

2000

Singer

3

Kiara

3000

Scientist

4

Phoebe

2500

Farmer

5

Mike

4000

Baker

6

Avril

4300

Banker

7

Miley

8700

Actress

8

Katy

6500

Chef

9

Seth

2300

Miner

10

Ben

9800

Director

After, applying Systematic sampling, plugin will emit 4 random records considering the sample size and over-sampling percentage provided in the inputs.

Created in 2020 by Google Inc.