Dynamic Multiple Fileset Sink (Deprecated)

Note: Datasets and the Dynamic Multiple Fileset Sink are deprecated and will be removed in CDAP 7.0.0.

This plugin is normally used in conjunction with the Multiple Database Table batch source to write records from multiple databases into multiple filesets in text format. Each fileset it writes to will contain a single ‘ingesttime’ partition, which will contain the logical start time of the pipeline run. The plugin expects that the filsets it needs to write to will be set as pipeline arguments, where the key is ‘multisink.[fileset]’ and the value is the fileset schema. Normally, you rely on the Multiple Database Table source to set those pipeline arguments, but they can also be manually set or set by an Action plugin, such as an HTTP Argument Setter, in your pipeline. The sink will expect each record to contain a special Split Field that will be used to determine which records are written to each fileset. For example, suppose the split field is ‘tablename’. A record whose ‘tablename’ field is set to ‘activity’ will be written to the ‘activity’ fileset.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Split Field

No

Optional. The name of the field that will be used to determine which fileset to write to.

Default is ‘tablename’.

Field Delimiter

No

Optional. The delimiter used to separate record fields. Defaults to the tab character.

Example

This example uses a comma to delimit record fields:

Property

Value

Property

Value

Field Delimiter

","

Suppose the input records are:

id

name

email

tablename

id

name

email

tablename

0

Samuel

sjax@example.net

accounts

1

Alice

a@example.net

accounts

userid

item

action

tablename

userid

item

action

tablename

0

shirt123

view

activity

0

carxyz

view

activity

0

shirt123

buy

activity

0

coffee

view

activity

1

cola

buy

activity

The plugin will expect two pipeline arguments to tell it to write the first two records to an ‘accounts’ fileset and the last records to an ‘activity’ fileset:

multisink.accounts = { "type": "record", "name": "accounts", "fields": [ { "name": "id", "type": "long" } , { "name": "name", "type": "string" }, { "name": "email", "type": [ "string", "null" ] } ] } multisink.activity = { "type": "record", "name": "activity", "fields": [ { "name": "userid", "type": "long" } , { "name": "item", "type": "string" }, { "name": "action", "type": "string" } ] }



Created in 2020 by Google Inc.