Redshift to S3 Action

The Redshift to S3 action plugin is available in the Hub.

The Redshift to S3 Action runs the UNLOAD command on AWS to save the results of a query from Redshift to one or more files on Amazon Simple Storage Service (Amazon S3). For more information on the UNLOAD command, please see the AWS Documentation.

Use this plugin when you need to quickly move data from Redshift to an S3 bucket before processing in a pipeline. For example, a financial customer would like to quickly unload financial reports into S3 that have been generated from processing that is happening in Redshift. The pipeline would have a Redshift to S3 action at the beginning, and then leverage the S3 source to read that data into a processing pipeline.

Configuration

Property	Macro Enabled?	Description

Property	Macro Enabled?	Description
Query	Yes	Required. A SELECT query, the results of which are unloaded from Redshift table to the S3 bucket.
JDBC Redshift Cluster Database URL	Yes	Required. JDBC Redshift DB url for connecting to the redshift cluster. The URL should include the port and database name. It should be in the format: `jdbc:redshift://<endpoint-address>:<endpoint-port>/<db-name>`. This is only used to issue the UNLOAD command, not to execute the query. This plugin leverages the `com.amazon.redshift.jdbc42.Driver` for making connections.
Redshift Master User	Yes	Required. Master user for the Redshift cluster to connect to.
Redshift Master Password	Yes	Required. Master password for Redshift cluster to connect to.
S3 Data Path	Yes	Required. The full path, including bucket name, to the location on Amazon S3 where Amazon Redshift will write the output file objects, including the manifest file if MANIFEST is specified. It should be in the format: `s3://object-path/name-prefix`. This supports globbing syntax such as `prefix-*`. To read an entire directory, ensure the path ends with a trailing `/`.
Access Key	Yes	Optional. The Access Key provided by AWS so that the Redshift cluster can write to the S3 location.
Secret Access Key	Yes	Optional. AWS Secret Key provided by AWS so that the Redshift cluster can write to the S3 location.
AWS IAM Role	Yes	Optional. IAM role having GET, LIST, and PUT permissions to the S3 bucket. The IAM Role should be in the form of `arn:aws:iam::<aws-account-id>:role/<role-name>`. For more information about IAM Roles and the UNLOAD command, see the AWS Documentation.
Output Path Token	Yes	Optional. The macro key used to store the S3 file path for the unloaded file(s). Plugins that run at later stages in the pipeline can retrieve the file path using this key through macro substitution. For example, you could use `${filePath}` if `filePath` is the key specified. Default is `filePath`.
Manifest	Yes	Optional. Used to determine if a manifest file is to be created during the unload. The manifest file explicitly lists the data files that are created by the UNLOAD process. It will be created in the same directory as the UNLOADed files. Default is false.
S3 Delimiter	Yes	Optional. Single ASCII character that is used to separate fields in the output file. Default is \| (pipe).
Parallel	Yes	Optional. Used to determine if UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. Default is true.
Compression	Yes	Optional. Unloads data into one or more compressed files. Can be one of the following: NONE, BZIP2 or GZIP. Default is NONE.
Allow Overwrite	Yes	Optional. Used to determine if UNLOAD will overwrite existing files, including the manifest file, if the file is already available. Default is false.
Add Quotes	Yes	Optional. Used to determine if UNLOAD places quotation marks around each unloaded data field, so that Redshift can unload data values that contain the delimiter itself. Default is false.
Escape	Yes	Optional. Used to determine if escape character (\) is to be placed before CHAR and VARCHAR columns in delimited unload file for the following characters: Linefeed `\n`, Carriage return `\r`, delimiter, escape character \, and quote character: " or '. Default is false.

Usage Notes

Both Access/Secret Keys and IAM Role should not be provided at the same time. Only one authentication mechanism should be provided.
The Redshift table from which user wants to unload the data should already exist in the database specified by the Redshift Cluster URL.
The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.
S3 data path should start with s3:// and not with the s3n:// or s3a:// URI scheme.