Page Comparison

Note: Datasets and the Parquet Dynamic Partitioned Dataset Sink are deprecated and will be removed in CDAP 7.0.0.

Sink for a PartitionedDataset that writes data in Parquet format and leverages one or more record field values for creating partitions. All data for the run will be written to a partition based on the specified fields and their value.

Use this sink whenever you want to write to a PartitionedFileSet in Parquet format using a value from the record as a partition. For example, you might want to load historical data from a database and partition the dataset on the original creation date of the data.

Configuration

Property	Macro Enabled?	Description
Dataset Name	Yes	Required. Name of the PartitionedFileSet to which records are written. If it doesn’t exist, it will be created.
Dataset Base Path	Yes	Optional. Base path for the PartitionedFileSet. Defaults to the name of the dataset. Default is [Namespace]/data/[Dataset name].
Partition Field Names	Yes	Required. One or more fields that will be used to partition the dataset.
Compression Codec	No	Optional. Parameter to determine the compression codec to use on the resulting data. Valid values are None, Snappy, GZip, and LZO. Default is None.
Append to Existing Partition	No	Optional. Allow appending to existing partitions, by default this capability is disabled. Default is No.
Output Schema	Yes	Required. The Avro schema of the record being written to the sink as a JSON Object.

Example

For example, suppose the sink receives input records from customers and purchases:

...

Versions Compared

Old Version 4

New Version Current

Key

Configuration

Example