Dynamic Multiple Fileset Sink (Deprecated)
Note: Datasets and the Dynamic Multiple Fileset Sink are deprecated and will be removed in CDAP 7.0.0.
This plugin is normally used in conjunction with the Multiple Database Table batch source to write records from multiple databases into multiple filesets in text format. Each fileset it writes to will contain a single ‘ingesttime’ partition, which will contain the logical start time of the pipeline run. The plugin expects that the filsets it needs to write to will be set as pipeline arguments, where the key is ‘multisink.[fileset]’ and the value is the fileset schema. Normally, you rely on the Multiple Database Table source to set those pipeline arguments, but they can also be manually set or set by an Action plugin, such as an HTTP Argument Setter, in your pipeline. The sink will expect each record to contain a special Split Field that will be used to determine which records are written to each fileset. For example, suppose the split field is ‘tablename’. A record whose ‘tablename’ field is set to ‘activity’ will be written to the ‘activity’ fileset.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Split Field | No | Optional. The name of the field that will be used to determine which fileset to write to. Default is ‘tablename’. |
Field Delimiter | No | Optional. The delimiter used to separate record fields. Defaults to the tab character. |
Example
This example uses a comma to delimit record fields:
Property | Value |
---|---|
Field Delimiter |
|
Suppose the input records are:
id | name | tablename | |
---|---|---|---|
0 | Samuel | sjax@example.net | accounts |
1 | Alice | a@example.net | accounts |
userid | item | action | tablename |
---|---|---|---|
0 | shirt123 | view | activity |
0 | carxyz | view | activity |
0 | shirt123 | buy | activity |
0 | coffee | view | activity |
1 | cola | buy | activity |
The plugin will expect two pipeline arguments to tell it to write the first two records to an ‘accounts’ fileset and the last records to an ‘activity’ fileset:
multisink.accounts =
{
"type": "record",
"name": "accounts",
"fields": [
{ "name": "id", "type": "long" } ,
{ "name": "name", "type": "string" },
{ "name": "email", "type": [ "string", "null" ] }
]
}
multisink.activity =
{
"type": "record",
"name": "activity",
"fields": [
{ "name": "userid", "type": "long" } ,
{ "name": "item", "type": "string" },
{ "name": "action", "type": "string" }
]
}
Created in 2020 by Google Inc.