Avro Dynamic Partitioned Dataset Sink (Deprecated)
The Avro Dynamic Partitioned Dataset sink is available in the Hub.
Note: Datasets and the Avro Dynamic Partitioned Dataset Sink are deprecated and will be removed in CDAP 7.0.0.
Sink for a PartitionedDataset
that writes data in Avro format and leverages one or more record field values for creating partitions. All data for the run will be written to a partition based on the specified fields and their value.
This sink is used whenever you want to write to a PartitionedFileSet
in Avro format using a value from the record as a partition. For example, you might want to load historical data from a database and partition the dataset on the original creation date of the data.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Dataset Name | Yes | Required. Name of the PartitionedFileSet to which records are written. If it doesn’t exist, it will be created. |
Dataset Base Path | Yes | Optional. Base path for the PartitionedFileSet. Defaults to the name of the dataset. Default is [Namespace]/data/[Dataset name] |
Partition Field Names | Yes | Required. One or more fields that will be used to partition the dataset. |
Compression Codec | No | Optional. Determines the compression codec to use on the resulting data. Valid values are None, Snappy, and GZip. Default is None. |
Append to Existing Partition | No | Optional. Allow appending to existing partitions, by default this capability is disabled. Default is No. |
Example
For example, suppose the sink receives input records from customers and purchases:
id | first_name | last_name | street_address | city | state | zipcode | purchase_date |
---|---|---|---|---|---|---|---|
1 | Douglas | Williams | 1, Vista Montana | San Jose | CA | 95134 | 2009-01-02 |
2 | David | Johnson | 3, Baypoint Parkway | Houston | TX | 78970 | 2009-01-01 |
3 | Hugh | Jackman | 5, Cool Way | Manhattan | NY | 67263 | 2009-01-01 |
4 | Walter | White | 3828 Piermont Dr | Orlando | FL | 73498 | 2009-01-03 |
5 | Frank | Underwood | 1609 Far St. | San Diego | CA | 29770 | 2009-01-03 |
6 | Serena | Woods | 123 Far St. | Las Vegas | NV | 45334 | 2009-01-01 |
If we choose purchase_date
as a partition column field, the sink will create a PartitionedDataset
and populate the partitions as follows:
id | first_name | last_name | street_address | city | state | zipcode | purchase_date |
---|---|---|---|---|---|---|---|
2 | David | Johnson | 3, Baypoint Parkway | Houston | TX | 78970 | 2009-01-01 |
3 | Hugh | Jackman | 5, Cool Way | Manhattan | NY | 67263 | 2009-01-01 |
6 | Serena | Woods | 123 Far St. | Las Vegas | NV | 45334 | 2009-01-01 |
id | first_name | last_name | street_address | city | state |
---|