Sampling Aggregate Analytics
The Sampling Aggregate analytics plugin is available in the Hub.
Sampling a large dataset flowing through this plugin to pull random records. Supports two types of sampling, for example, Systematic Sampling and Reservoir Sampling.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Sample Size | Yes | Optional. The number of records that needs to be sampled from the input records. Either of 'samplePercentage' or 'sampleSize' should be specified for this plugin. |
Sample Percentage | Yes | Optional. The percentage of records that needs to be sampled from the input records. Either of 'samplePercentage' or 'sampleSize' should be specified for this plugin. |
Sampling Type | No | Required. Type of the Sampling algorithm that should to be used to sample the data. This can be either Systematic or Reservoir. Default is Systematic. |
Over Sampling Percentage | Yes | Optional. The percentage of additional records that should be included in addition to the input sample size to account for oversampling. Required for Systematic Sampling. |
Random | Yes | Optional. Random float value between 0 and 1 to be used in Systematic Sampling. If not provided, plugin will internally generate random value. |
Total Records | Yes | Optional. Total number of input records for Systematic Sampling. |
Example
This example read data from some stream and sort them alphabetically using a OrderBy plugin and uses Systematic Sampling to sample the input records considering the sample size and oversampling percentage mentioned in the inputs below:
Property | Value |
---|---|
Sample Size |
|
Sampling Type |
|
Over Sampling Percentage |
|
Total Records |
|
If the aggregator receives as an input record:
id | name | salary | occupation |
---|---|---|---|
1 | John | 1000 | Artist |
2 | Kelly | 2000 | Singer |
3 | Kiara | 3000 | Scientist |
4 | Phoebe | 2500 | Farmer |
5 | Mike | 4000 | Baker |
6 | Avril | 4300 | Banker |
7 | Miley | 8700 | Actress |
8 | Katy | 6500 | Chef |
9 | Seth | 2300 | Miner |
10 | Ben | 9800 | Director |
After, applying Systematic sampling, plugin will emit 4 random records considering the sample size and over-sampling percentage provided in the inputs.
Created in 2020 by Google Inc.