Info |
---|
The Avro Dynamic Partitioned Dataset sink is available in the Hub. |
Note: Datasets and the Avro Dynamic Partitioned Dataset Sink are deprecated and will be removed in CDAP 7.0.0.
Sink for a PartitionedDataset
that writes data in Avro format and leverages one or more record field values for creating partitions. All data for the run will be written to a partition based on the specified fields and their value.
This sink is used whenever you want to write to a PartitionedFileSet
in Avro format using a value from the record as a partition. For example, you might want to load historical data from a database and partition the dataset on the original creation date of the data.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Dataset Name | Yes | Required. Name of the PartitionedFileSet to which records are written. If it doesn’t exist, it will be created. |
Dataset Base Path | Yes | Optional. Base path for the PartitionedFileSet. Defaults to the name of the dataset. Default is [Namespace]/data/[Dataset name] |
Partition Field Names | Yes | Required. One or more fields that will be used to partition the dataset. |
Compression Codec | No | Optional. Determines the compression codec to use on the resulting data. Valid values are None, Snappy, and GZip. Default is None. |
Append to Existing Partition | No | Optional. Allow appending to existing partitions, by default this capability is disabled. Default is No. |
Example
For example, suppose the sink receives input records from customers and purchases:
...