Avro Dynamic Partitioned Dataset Sink (Deprecated)

The Avro Dynamic Partitioned Dataset sink is available in the Hub.

Note: Datasets and the Avro Dynamic Partitioned Dataset Sink are deprecated and will be removed in CDAP 7.0.0.

Sink for a PartitionedDataset that writes data in Avro format and leverages one or more record field values for creating partitions. All data for the run will be written to a partition based on the specified fields and their value.

This sink is used whenever you want to write to a PartitionedFileSet in Avro format using a value from the record as a partition. For example, you might want to load historical data from a database and partition the dataset on the original creation date of the data.

Configuration

Property	Macro Enabled?	Description

Property	Macro Enabled?	Description
Dataset Name	Yes	Required. Name of the PartitionedFileSet to which records are written. If it doesn’t exist, it will be created.
Dataset Base Path	Yes	Optional. Base path for the PartitionedFileSet. Defaults to the name of the dataset. Default is [Namespace]/data/[Dataset name]
Partition Field Names	Yes	Required. One or more fields that will be used to partition the dataset.
Compression Codec	No	Optional. Determines the compression codec to use on the resulting data. Valid values are None, Snappy, and GZip. Default is None.
Append to Existing Partition	No	Optional. Allow appending to existing partitions, by default this capability is disabled. Default is No.

Example

For example, suppose the sink receives input records from customers and purchases:

id	first_name	last_name	street_address	city	state	zipcode	purchase_date

id	first_name	last_name	street_address	city	state	zipcode	purchase_date
1	Douglas	Williams	1, Vista Montana	San Jose	CA	95134	2009-01-02
2	David	Johnson	3, Baypoint Parkway	Houston	TX	78970	2009-01-01
3	Hugh	Jackman	5, Cool Way	Manhattan	NY	67263	2009-01-01
4	Walter	White	3828 Piermont Dr	Orlando	FL	73498	2009-01-03
5	Frank	Underwood	1609 Far St.	San Diego	CA	29770	2009-01-03
6	Serena	Woods	123 Far St.	Las Vegas	NV	45334	2009-01-01

If we choose purchase_date as a partition column field, the sink will create a PartitionedDataset and populate the partitions as follows:

id	first_name	last_name	street_address	city	state	zipcode	purchase_date

id	first_name	last_name	street_address	city	state	zipcode	purchase_date
2	David	Johnson	3, Baypoint Parkway	Houston	TX	78970	2009-01-01
3	Hugh	Jackman	5, Cool Way	Manhattan	NY	67263	2009-01-01
6	Serena	Woods	123 Far St.	Las Vegas	NV	45334	2009-01-01

id	first_name	last_name	street_address	city	state	zipcode	purchase_date

id	first_name	last_name	street_address	city	state	zipcode	purchase_date
1	Douglas	Williams	1, Vista Montana	San Jose	CA	95134	2009-01-02

I

id	first_name	last_name	street_address	city	state	zipcode	purchase_date

id	first_name	last_name	street_address	city	state	zipcode	purchase_date
4	Walter	White	3828 Piermont Dr	Orlando	FL	73498	2009-01-03
5	Frank	Underwood	1609 Far St.	San Diego	CA	29770	2009-01-03