Snowflake Data Loading Action

Snowflake Data Loading Action

Introduction

Data loading in Snowflake (https://docs.snowflake.net/manuals/user-guide/data-load-overview.html) is performed in two, separate steps. The first step is to load the files into a staging area, and the second step is to trigger the upload with a COPY INTO step. This action plugin will execute both steps using Snowflake JDBC.

Use case(s)

  • As a financial services company, I would like to build an analytics pipeline to push data into the Snowflake. I would leverage Hydrator, and use this plugin as action to create files in the staging area and trigger the loading of the data.

User Storie(s)

  • I would like to create files in the staging area and trigger loading of the data.

  • I would like to have capability to load files from internal and external(s3) location.

Plugin Type

Action

Configurables

This section defines properties that are configurable for this plugin. 

User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Account Name

String

Snowflake account name

 

User Name

String

Username

 

Password 

String

Password

 

Warehouse Name

String

Snowflake Warehouse

 

Database Name

String

Database

 

Schema 

String

DB Schema

 

Table Name

String

Table where data is to be loaded

 

Stage Name

String

Staging area where data would be loaded intermediately

 

Data Source

Select

Source of the data to be loaded. Defaults to Internal.

 

File Path/S3 Bucket URL

String

Path/URL(AWS S3) of the data to be loaded

 

File Format Type

Select

File format to specify the type of file. Defaults to CSV.

Possible values are:

CSV,AVRO,PARQUET,JSON,XML

Authentication Method

Select

Authentication method. Defaults to Access Credentials.

For IAM authentication, cluster should be hosted on AWS servers

S3 Access Key

String

Access key for AWS S3 to connect to.

Mandatory if Data Source is external(AWS S3).

S3 Secret Access Key

String

Secret access key for AWS S3 to connect to.

Mandatory if Data Source is external(AWS S3).

Design / Implementation Tips

Design

Approach(s)

1.For loading files from internal location,below high level commands would be executed in snowflake using jdbc:
     a.create or replace stage
     b.PUT files onto staging location from the specified path where globbing patterns (i.e. wildcards) are accepted.
     c.COPY INTO specified snowflake table.
2.For loading files from external location i.e S3,below high level commands would be executed in snowflake using jdbc:
     a.create or replace stage using URL option
     b.COPY INTO specified snowflake table having a single VARIANT column.

Properties

accountName: Snowflake account name
userName: Username
password: Password
warehouse: Snowflake warehouse
dbName: Database Name
schema: DB Schema
tableName: The Snowflake table name where the data will be loaded
stageName: The staging location where the data from the data source will be loaded
dataSource: Source of the data to be loaded. Defaults to Internal
path: Path/URL(AWS S3) of the data to be loaded
fileFormatType: File format to specify the type of file. Defaults to CSV
s3AuthenticationMethod: Authentication method. Defaults to Access Credentials. For IAM authentication, cluster should be hosted on AWS servers
accessKey: Access key for AWS S3 to connect to.Mandatory if dataSource is AWS S3
secretAccessKey: Secret access key for AWS S3 to connect to. Mandatory if dataSource is AWS S3

Limitation(s)

1.COPY INTO command ignores staged data files that were already loaded into the table.FORCE parameter is not used since loading of same files is not required.https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html
2.JSON/XML/AVRO/PARQUET file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.
3.Default values for format options are considered.https://docs.snowflake.net/manuals/sql-reference/sql/create-stage.html
4.Create or replace stage command is used to create stage so that if stage already exists with different set of files,would be replaced.

Future Work

  • Some future work – HYDRATOR-99999

  • Another future work – HYDRATOR-99999

Test Case(s)

  • Load Avro data from S3 to Snowflake using s3 keys

  • Load Parquet data from S3 to Snowflake using s3 keys

Sample Pipeline

Pipeline #1

Pipeline #2

 

 

Table of Contents

Checklist

User stories documented 
User stories reviewed 
Design documented 
Design reviewed 
Feature merged 
Examples and guides 
Integration tests 
Documentation for feature 
Short video demonstrating the feature

Created in 2020 by Google Inc.