Introduction

One of the most efficient ways to load data from Amazon Redshift to s3 is using the UNLOAD command: http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html

Use case(s)

A financial customer would like to quickly unload financial reports into S3 that have been generated from processing that is happening in Redshift. The pipeline would have a Redshift to S3 action at the beginning, and then leverage the s3 source to read that data into a processing pipeline.

User Storie(s)

As a user, i would like to unload data from redshift to s3 using the unload command.
I would like to authenticate with IAM credentials as well as id and secret key pairs.
I would like to use that s3 data as an input into a hydrator pipeline
I would like the location of the data to be passed via workflow token so that they next plugin can use it in a macro

Plugin Type

Action

Configurables

This section defines properties that are configurable for this plugin.

User Facing Name	Type	Description
Query	String	select statement to be used for unloading the data
Access Key	String	AWS access key for S3
Secret Access Key	String	AWS secret access key for S3
AWS IAM Role	String	IAM Role
S3 Bucket	String	Amazon S3 bucket (including key prefix). Should be of the format 's3://object-path/name-prefix'
Manifest	Boolean	Specify whether manifest file is to be created during unloading the data into S3
S3 Delimiter	String	The delimiter by which fields in a character-delimited file are to be separated
Parallel	String	To write data in parallel to multiple files, according to the number of slices in the cluster or to a single file. The default option is ON or TRUE.
Compression	String	Unload a compressed file of type BZIP2 or GZIP
Allow Over-Write	String	By default, UNLOAD fails if it finds files that it would possibly overwrite. If ALLOWOVERWRITE is specified, UNLOAD will overwrite existing files, including the manifest file.
Add Quotes	Boolean	Places quotation marks around each unloaded data field, so that Amazon Redshift can unload data values that contain the delimiter itself. If you use ADDQUOTES, you must specify REMOVEQUOTES in the COPY if you reload the data.
Escape	Boolean	If true, an escape character (\) is placed before every occurrence of - Linefeed: \n, Carriage return:\r, the delimiter character specified for the unloaded data, escape character \, quote character: " or ' (if both ESCAPE and ADDQUOTES are specified in the UNLOAD command), for CHAR and VARCHAR columns in delimited unload files.
Redshift Cluster DB Url	String	JDBC Redshift DB url for connecting to the redshift cluster
Master User	String	Master user for redshift
Master User Password	String	Master user password
Redshift Table Name	String	Redshift table name from which data is to be unloaded

Design / Implementation Tips

Follow the tutorial here to get a better understanding of what might be http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html

Design

User can connect to the S3 buckets using Either access and secret access key or using IAM role.
For using the IAM role to unload the data, the IAM role should have GET, LIST and PUT access on the bucket.
UNLOAD automatically creates files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE-S3). UNLOAD does not support Amazon S3 server-side encryption with encryption keys from SSE-KMS or a customer-supplied key (SSE-C).
Plugin would always unload data with Amazon S3 server-side encryption.

Conditions

The Amazon S3 bucket The plugin will unload data using Amazon S3 server-side encryption (SSE-S3).

Approach(s)

Plugin will check if the query starts with select and contains the from clause. Any other error, in the SQL statement will result in run-time failure.
The user can either provide the credentials : accessKey and secretAccessKey or the IAM role (which has GET, LIST and PUT permissions on the bucket) for unloading the data from redshift to S3.
S3 bucket is the full path, including bucket name, to the location on Amazon S3 where Amazon Redshift will write the output files must reside in the same region as your cluster.(http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html)

Approach(s)

file objects, including the manifest file if MANIFEST is specified. The object names are prefixed with name-prefix.

Properties

Code Block

language	java
title	RedshiftToS3 - Action: Json

{
  "name": "RedshiftToS3Action",
  "type": "action",
  "properties": {
    "accessKey": "accessKey",
    "secretAccessKey": "secretAccessKey",
    "iamRole": "arn:aws:iam::0123456789012:role/MyRedshiftRole",
    "query": "select * from venue",
    "s3Bucket": "s3://mybucket/test/",
    "manifest": "false",
    "delimiter": ",",
    "fixedWidth": "",
    "parallel": "off",
    "compression" : "none",
    "allowOverwrite" : "true",
    "addQuotes": "true",
    "redshiftClusterAddress" : "jdbc:redshift://test.cntuu3e3qg5h.us-west-1.redshift.amazonaws.com:5439/redshiftdb",
    "redshiftMasterUser" : "admin",
    "redshiftMasterPassword": "admin",
    "redshiftTableName": "testTable"
  }
}

Security

Limitation(s)

The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.(http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html)

Future Work

Some future work – HYDRATOR-99999
Another future work – HYDRATOR-99999

Test Case(s)

Test case #1
Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data.

Pipeline #1

Pipeline #2

Table of Contents

Table of Contents

style	circle

Checklist

User stories documented
User stories reviewed
Design documented
Design reviewed
Feature merged
Examples and guides
Integration tests
Documentation for feature
Short video demonstrating the feature

Versions Compared

Old Version 5

New Version 6

Key

Introduction

Use case(s)

User Storie(s)

Plugin Type

Configurables

Design / Implementation Tips

Design

Conditions

Approach(s)

Properties

Security

Limitation(s)

Future Work

Test Case(s)

Sample Pipeline

Pipeline #1

Pipeline #2

Page Comparison

Versions Compared

Old Version 5

New Version 6

Key

Introduction

Use case(s)

User Storie(s)

Plugin Type

Configurables

Design / Implementation Tips

Design

Conditions

Approach(s)

Properties

Security

Limitation(s)

Future Work

Test Case(s)

Sample Pipeline

Pipeline #1

Pipeline #2