File Sink

Plugin version: 2.11.0

Writes to a file system in various formats.

For the csv, delimited, and tsv formats, each record is written out as delimited text. Complex types like arrays, maps, and records will be converted to strings using their toString() Java method, so for practical use, fields should be limited to the string, long, int, double, float, and boolean types.

All types are supported when using the avro or parquet format.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Reference Name

No

Required. Name used to uniquely identify this sink for lineage, annotating metadata, etc.

Path

Yes

Required. Path to write to. For example, /path/to/output

You can also use the logicalStartTime function to append a date to the output filename.

Path Suffix

Yes

Optional. Time format for the output directory that will be appended to the path. For example, the format ‘yyyy-MM-dd-HH-mm’ will result in a directory of the form ‘2015-01-01-20-42’. If not specified, nothing will be appended to the path.

Default is yyyy-MM-dd-HH-mm.

Format

No

Required. Format to write the records in. The format must be one of ‘json’, ‘avro’, ‘parquet’, ‘csv’, ‘tsv’, or ‘delimited’.

Default is json.

Delimiter

Yes

Optional. Delimiter to use if the format is ‘delimited’.

Write Header

Yes

Optional. Whether to write a header to each file if the format is ‘delimited’, ‘csv’, or ‘tsv’.

Default is false.

File System Properties

Yes

Optional. Additional properties to use with the OutputFormat when reading the data. You can use this property to set the prefix of the output file name. If you are using avro, you can set the file prefix using a property like the following one:

{
  "avro.mo.config.namedOutput": "sales-2002",
}

If you are using any other file format, you can set the file prefix with a property like the following one:

{
  "mapreduce.output.basename": "sales-2002"
}

Also, you might use this property when you have multiple runs of a reusable pipeline writing to the same output directory.

Output Schema

Yes

Required. Schema of the data to write.

 

Created in 2020 by Google Inc.