Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

One of the most efficient ways to load data from Amazon Redshift to s3 is using the UNLOAD command:  http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html

Use case(s)

  • A financial customer would like to quickly unload financial reports into S3 that have been generated from processing that is happening in Redshift. The pipeline would have a Redshift to S3 action at the beginning, and then leverage the s3 source to read that data into a processing pipeline.

User Storie(s)

  • As a user, i would like to unload data from redshift to s3 using the unload command.
  • I would like to authenticate with IAM credentials as well as id and secret key pairs.
  • I would like to use that s3 data as an input into a hydrator pipeline
  • I would like the location of the data to be passed via workflow token so that they next plugin can use it in a macro

Plugin Type

  •  Action

Configurables

This section defines properties that are configurable for this plugin. 

User Facing NameTypeDescriptionConstraints
QueryStringselect statement to be used for unloading the data 
Access KeyStringAWS access key for S3 
Secret Access KeyStringAWS secret access key for S3 
AWS IAM RoleStringIAM Role 
S3 BucketStringAmazon S3 bucket (including key prefix) 
ManifestBooleanSpecify whether manifest file is to be created during unloading the data into S3 
S3 DelimiterStringThe delimiter by which fields in a character-delimited file are to be separated 
ParallelStringTo write data in parallel to multiple files, according to the number of slices in the cluster or to a single file. The default option is ON or TRUE. 
CompressionStringUnload a compressed file of type BZIP2 or GZIP ALLOWOVERWRITE
Allow Over-WriteStringBy default, UNLOAD fails if it finds files that it would possibly overwrite. If ALLOWOVERWRITE is specified, UNLOAD will overwrite existing files, including the manifest file. 
Add Quotes Places quotation marks around each unloaded data field, so that Amazon Redshift can unload data values that contain the delimiter itself. If you use ADDQUOTES, you must specify REMOVEQUOTES in the COPY if you reload the data. 
Redshift Cluster DB UrlStringJDBC Redshift DB url for connecting to the redshift cluster 
Master UserStringMaster user for redshift 
Master User PasswordStringMaster user password 
Redshift Table NameStringRedshift table name from which data is to be unloaded 

Design / Implementation Tips

Design

  1. User can connect to the S3 buckets using Either access and secret access key or using IAM role.
  2. By default unload command assumes that Redshift and s3 are in the same region, If S3 bucket is not in same region as of the Redshift cluster, then user can provide the region using 'Region for S3'.
 
  1. For using the IAM role to unload the data, the IAM role should have GET, LIST and PUT access on the bucket.
  2. UNLOAD automatically creates files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE-S3). UNLOAD does not support Amazon S3 server-side encryption with encryption keys from SSE-KMS or a customer-supplied key (SSE-C).
  3. Plugin would always unload data with Amazon S3 server-side encryption.

Conditions

  1. The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.(http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html)

 

Approach(s)

Properties

Code Block
languagejava
titleRedshiftToS3 - Action: Json
{
  "name": "RedshiftToS3Action",
  "type": "action",
  "properties": {
    "accessKey": "accessKey",
    "secretAccessKey": "secretAccessKey",
    "iamRole": "arn:aws:iam::0123456789012:role/MyRedshiftRole",
    "query": "select * from venue",
    "s3Bucket": "s3://mybucket/test/",
    "manifest": "false",
    "delimiter": ",",
    "fixedWidth": "",
    "parallel": "off",
    "compression" : "none",
    "allowOverwrite" : "true",
    "addQuotes": "true",
    "redshiftClusterAddress" : "jdbc:redshift://test.cntuu3e3qg5h.us-west-1.redshift.amazonaws.com:5439/redshiftdb",
    "redshiftMasterUser" : "admin",
    "redshiftMasterPassword": "admin",
    "redshiftTableName": "testTable"
  }
}

 

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2

 

 

Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature