FTP Batch Source

Plugin version: 4.0.0

Note: The FTP batch source system plugin version 2.9.0 and below is deprecated. Instead, use the FTP batch source plugin (3.0.0 and later) available in the Hub.

Batch source for an FTP or SFTP source. Prefix of the path ('ftp://...' or 'sftp://...') determines the source server type, either FTP or SFTP.

This source is used whenever you need to read from an FTP or SFTP server.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Reference Name

Yes

 

Required. This will be used to uniquely identify this source for lineage, annotating metadata, etc.

Server Type

Yes

 

Required. Whether to read from an FTP or SFTP server.

Host

Yes

4.0.0

Required. Host to read from.

Port

Yes

4.0.0

Optional port to read from. If no value is given, it defaults to port 21 for FTP and port 22 for SFTP.

Path

Yes

 

Required. Path to file(s) to be read. The path uses filename expansion (globbing) to read files. Path is expected to be of the form prefix://username:password@hostname:port/path

Username

Yes

4.0.0

Required. The username to use for authentication.

Password

Yes

4.0.0

Required. The password to use for authentication.

Format

Yes

3.1.0/3.2.0

Optional. Format of the data to read. The format must be one of ‘blob’, ‘csv’, ‘delimited’, ‘json’, ‘text’, ‘tsv’, or the name of any format plugin that you have deployed to your environment. Note that FTP does not support seeking in a file, so formats like avro and parquet cannot be used. If the format is a macro, only the formats listed above can be used. If the format is ‘blob’, every input file will be read into a separate record. The ‘blob’ format also requires a schema that contains a field named ‘body’ of type ‘bytes’. If the format is ‘text’, the schema must contain a field named ‘body’ of type ‘string’.

Get Schema

No

3.1.0/3.2.0

Auto-detects schema from file. Supported formats are: csv, delimited, tsv, blob and text.

Blob - is set by default as field named ‘body’ of type bytes.

Text - is set by default as two fields: ‘body’ of type bytes and ‘offset’ of type ‘long’.

JSON - is not supported. You must manually provide the output schema.

Delimiter

Yes

3.1.0/3.2.0

Optional. Delimiter to use when the format is ‘delimited’. This will be ignored for other formats.

Use First Row as Header

Yes

3.1.0/3.2.0

Optional. Whether to use the first line of each file as the column headers. Supported formats are ‘text’, ‘csv’, ‘tsv’, and ‘delimited’.

Enable Quoted Values

Yes

3.1.0/3.2.0

Optional. Whether to treat content between quotes as a value. This value will only be used if the format is ‘csv’, ‘tsv’ or ‘delimited’. For example, if this is set to true, a line that looks like 1, "a, b, c" will output two fields. The first field will have 1 as its value and the second will have a, b, c as its value. The quote characters will be trimmed. The newline delimiter cannot be within quotes.

It also assumes the quotes are well enclosed. The left quote will match the first following quote right before the delimiter. If there is an unenclosed quote, an error will occur.

Regex Path Filter

Yes

 

Optional. Regular expression that file paths must match in order to be included in the input. The full file path is compared, not just the filename. If no file is giving, no file filtering will be done. For more information about regular expression syntax, see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html.

File System Properties

Yes

 

Optional. A JSON string representing a map of properties needed for the distributed file system.

Allow Empty Input

No

 

Optional. Identify if path needs to be ignored or not, for case when directory or file does not exists. If set to true it will treat the not present folder as 0 input and log a warning.

Default is False.

Example

This example connects to an SFTP server and reads in files found in the specified directory.

Property

Value

Property

Value

Reference Name

ftp

Path

sftp://username:password@hostname:21/path/to/logs

Allow Empty Input

false



Created in 2020 by Google Inc.