Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Deploy SFTP plugins from the Hub.

    Image Removed
  2. Configure the pipeline with SFTPCopy SFTP Copy action to read from SFTP source.
    The source directory specifies the directory in SFTP server. The directory can take a wildcard pattern to read the files. The entire file(s) will be copied to the destination specified in the destination directory configuration. The destination directory will be created in Dataproc cluster.
    By default, all the files that are copied will be held in a variable called sftp.copied.file.nameswhich is configurable in the SFTPCopy SFTP Copy action plugin configuration.

  3. Use a File source to read the contents of the files copied from the SFTPCopy SFTP Copy Action.
    Configure the Path configuration in File source to be the same as destination directory configuration in SFTPCopy SFTP Copy action.

  4. Add any additional wrangling steps or any other transformations required to process the data.

  5. Use a BigQuery Sink sink to write the data to BigQuery Table by configuring the BQ Dataset and Table name.

Info

Optionally, use an SFTPDelete Action SFTP Delete action if the files read from SFTPCopy SFTP Copy should be deleted. The files to be deleted will be fetched by default from sftp.copied.file.names variable.

...