SFTP Copy Action
The SFTP Copy action plugin is available in the Hub.
Plugin version: 1.5.1
SFTP copy allows copying of the files from the specified directory on SFTP servers and write them to HDFS as the destination. The files that are copied can be optionally uncompressed before storing. The files are copied directly to HDFS without needing any additional staging area.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Host | Yes | Required. Host name of the SFTP server. |
Port | Yes | Optional. Port on which SFTP server is running. Default is 22. |
User Name | Yes | Required. Name of the user which will be used to connect to the SFTP server. |
Authentication | Yes | Required. Specifies the type of Authentication that will be used to connect to the SFTP Server. Default is PrivateKey. |
Private Key | Yes | Optional. Private RSA Key to be used to connect to the SFTP Server. This key is recommended to be stored in the Secure Key Store, and macro called into the Configuration. Must be a RSA key starting with —–BEGIN RSA PRIVATE KEY—– |
Private Key Passphrase | Yes | Optional. Passphrase to be used with RSA Private Key if a Passphrase was specified when key was generated. |
Password | Yes | Required. Password of the user. |
Source directory | Yes | Required. Absolute path of the directory on the SFTP server which is to be copied. If the directory is empty, the execution of the plugin will be no-op. |
Destination directory | Yes | Required. Destination directory on the file system, where files need to be copied. If directory does not exist, it will be created. |
Variable name to hold list of copied file names | No | Optional. Name of the variable which holds comma separated list of file names on the SFTP server which were copied during this run of the plugin. Usually this variable is used as Macro in the SFTP Delete action to delete the files from SFTP server once their processing is successful. Default is sftp.copied.file.names. |
Regex to match files that needs to be copied | Yes | Optional. Regex to choose only the files that are of interest. All files will be copied by default. Default is * |
Properties for SSH | No | Optional. Specifies the properties that are used to configure SSH connection to the FTP server. For example to enable verbose logging add property 'LogLevel' with value 'VERBOSE'. To enable host key checking set 'StrictHostKeyChecking' to 'yes'. SSH can be configured with the properties described here 'https://linux.die.net/man/5/ssh_config'. |
Properties for File System | No | Optional. Specifies the properties that are used to configure Destination File system for example: HDFS, ADLS |
Extract Zip Files | No | Optional. Boolean flag to determine whether zip files on the FTP server need to be extracted on the destination while copying. Default is False. |
Usage Notes
To perform SFTP copy, we require host and port on which the SFTP server is running. SFTP implements secure file transfer over SSH. Typically port number 22 is used for SFTP (which is also default port for SSH). We also require valid credentials in the form of user name and password. Make sure that you are able to SSH to the SFTP server using specified user and password. SSH connection to the SFTP server can be customized by providing additional configurations such as enable host key checking by setting configuration property 'StrictHostKeyChecking' to 'yes'. These additional configurations can be specified using Properties for SSH
section.
The directory on the SFTP server which needs to be copied can be specified using Source directory
property. The specified directory should exist and absolute path to the directory must be provided. If directory is empty, then execution will continue without any error. Destination directory
is the absolute path of the directory on HDFS where the files will be copied. If destination directory does not exists, then it will be created first. If file with the same name already exists in the destination directory, it will be overwritten.
Files from the SFTP server can optionally be uncompressed while copying to HDFS. Currently uncompress option is only supported for the zip files.
Typically the SFTP server acts as a temporary storage for the files and once processed the files can be deleted. Comma separated list of file names on the SFTP server which were copied to HDFS during the current run, is stored in a variable named sftp.copied.file.names
. The SFTP Delete action can be configured to run at the end of the pipeline, which uses this variable to determine the files to be deleted from SFTP server.
Created in 2020 by Google Inc.