SFTP Copy Action

The SFTP Copy action plugin is available in the Hub.

Plugin version: 1.5.1

SFTP copy allows copying of the files from the specified directory on SFTP servers and write them to HDFS as the destination. The files that are copied can be optionally uncompressed before storing. The files are copied directly to HDFS without needing any additional staging area.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Host

Yes

Required. Host name of the SFTP server.

Port

Yes

Optional. Port on which SFTP server is running.

Default is 22.

User Name

Yes

Required. Name of the user which will be used to connect to the SFTP server.

Authentication

Yes

Required. Specifies the type of Authentication that will be used to connect to the SFTP Server. Default is PrivateKey.

Private Key

Yes

Optional. Private RSA Key to be used to connect to the SFTP Server. This key is recommended to be stored in the Secure Key Store, and macro called into the Configuration. Must be a RSA key starting with —–BEGIN RSA PRIVATE KEY—–

Private Key Passphrase

Yes

Optional. Passphrase to be used with RSA Private Key if a Passphrase was specified when key was generated.

Password

Yes

Required. Password of the user.

Source directory

Yes

Required. Absolute path of the directory on the SFTP server which is to be copied. If the directory is empty, the execution of the plugin will be no-op.

Destination directory

Yes

Required. Destination directory on the file system, where files need to be copied. If directory does not exist, it will be created.

Variable name to hold list of copied file names

No

Optional. Name of the variable which holds comma separated list of file names on the SFTP server which were copied during this run of the plugin. Usually this variable is used as Macro in the SFTP Delete action to delete the files from SFTP server once their processing is successful.

Default is sftp.copied.file.names.

Regex to match files that needs to be copied

Yes

Optional. Regex to choose only the files that are of interest. All files will be copied by default.

Default is *

Properties for SSH

No

Optional. Specifies the properties that are used to configure SSH connection to the FTP server. For example to enable verbose logging add property 'LogLevel' with value 'VERBOSE'. To enable host key checking set 'StrictHostKeyChecking' to 'yes'. SSH can be configured with the properties described here 'https://linux.die.net/man/5/ssh_config'.

Properties for File System

No

Optional. Specifies the properties that are used to configure Destination File system for example: HDFS, ADLS

Extract Zip Files

No

Optional. Boolean flag to determine whether zip files on the FTP server need to be extracted on the destination while copying. Default is False.

Usage Notes

To perform SFTP copy, we require host and port on which the SFTP server is running. SFTP implements secure file transfer over SSH. Typically port number 22 is used for SFTP (which is also default port for SSH). We also require valid credentials in the form of user name and password. Make sure that you are able to SSH to the SFTP server using specified user and password. SSH connection to the SFTP server can be customized by providing additional configurations such as enable host key checking by setting configuration property 'StrictHostKeyChecking' to 'yes'. These additional configurations can be specified using Properties for SSH section.

The directory on the SFTP server which needs to be copied can be specified using Source directory property. The specified directory should exist and absolute path to the directory must be provided. If directory is empty, then execution will continue without any error. Destination directory is the absolute path of the directory on HDFS where the files will be copied. If destination directory does not exists, then it will be created first. If file with the same name already exists in the destination directory, it will be overwritten.

Files from the SFTP server can optionally be uncompressed while copying to HDFS. Currently uncompress option is only supported for the zip files.

Typically the SFTP server acts as a temporary storage for the files and once processed the files can be deleted. Comma separated list of file names on the SFTP server which were copied to HDFS during the current run, is stored in a variable named sftp.copied.file.names. The SFTP Delete action can be configured to run at the end of the pipeline, which uses this variable to determine the files to be deleted from SFTP server.



Created in 2020 by Google Inc.