Redshift to S3 Action (Deprecated)
This plugin is no longer available as of July 26, 2024.
The Redshift to S3 Action runs the UNLOAD command on AWS to save the results of a query from Redshift to one or more files on Amazon Simple Storage Service (Amazon S3). For more information on the UNLOAD command, please see the AWS Documentation.
Use this plugin when you need to quickly move data from Redshift to an S3 bucket before processing in a pipeline. For example, a financial customer would like to quickly unload financial reports into S3 that have been generated from processing that is happening in Redshift. The pipeline would have a Redshift to S3 action at the beginning, and then leverage the S3 source to read that data into a processing pipeline.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Query | Yes | Required. A SELECT query, the results of which are unloaded from Redshift table to the S3 bucket. |
JDBC Redshift Cluster Database URL | Yes | Required. JDBC Redshift DB url for connecting to the redshift cluster. The URL should include the port and database name. It should be in the format: |
Redshift Master User | Yes | Required. Master user for the Redshift cluster to connect to. |
Redshift Master Password | Yes | Required. Master password for Redshift cluster to connect to. |
S3 Data Path | Yes | Required. The full path, including bucket name, to the location on Amazon S3 where Amazon Redshift will write the output file objects, including the manifest file if MANIFEST is specified. It should be in the format: |
Access Key | Yes | Optional. The Access Key provided by AWS so that the Redshift cluster can write to the S3 location. |
Secret Access Key | Yes | Optional. AWS Secret Key provided by AWS so that the Redshift cluster can write to the S3 location. |
AWS IAM Role | Yes | Optional. IAM role having GET, LIST, and PUT permissions to the S3 bucket. The IAM Role should be in the form of |
Output Path Token | Yes | Optional. The macro key used to store the S3 file path for the unloaded file(s). Plugins that run at later stages in the pipeline can retrieve the file path using this key through macro substitution. For example, you could use Default is |
Manifest | Yes | Optional. Used to determine if a manifest file is to be created during the unload. The manifest file explicitly lists the data files that are created by the UNLOAD process. It will be created in the same directory as the UNLOADed files. Default is false. |
S3 Delimiter | Yes | Optional. Single ASCII character that is used to separate fields in the output file. Default is | (pipe). |
Parallel | Yes | Optional. Used to determine if UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. Default is true. |
Compression | Yes | Optional. Unloads data into one or more compressed files. Can be one of the following: NONE, BZIP2 or GZIP. Default is NONE. |
Allow Overwrite | Yes | Optional. Used to determine if UNLOAD will overwrite existing files, including the manifest file, if the file is already available. Default is false. |
Add Quotes | Yes | Optional. Used to determine if UNLOAD places quotation marks around each unloaded data field, so that Redshift can unload data values that contain the delimiter itself. Default is false. |
Escape | Yes | Optional. Used to determine if escape character (\) is to be placed before CHAR and VARCHAR columns in delimited unload file for the following characters: Linefeed Default is false. |
Usage Notes
Both Access/Secret Keys and IAM Role should not be provided at the same time. Only one authentication mechanism should be provided.
The Redshift table from which user wants to unload the data should already exist in the database specified by the
Redshift Cluster URL
.The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.
S3 data path should start with
s3://
and not with thes3n://
ors3a://
URI scheme.
Created in 2020 by Google Inc.