ADLS Sink

Plugin version: 0.15.0

Azure Data Lake Store sink writes data to Azure Data Lake Store directory in avro, orc or text format.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Reference name

No

Required. Name used to uniquely identify this sink for lineage, annotating metadata, etc.

Azure Data Lake Store Path

Yes

Required. Path to directory to store output files. The path must start with adl://

Azure Data Lake Store Client Id

Yes

Required. Microsoft Azure client Id which is typically Application ID.

Azure Data Lake Store Refresh Token URL

Yes

Required. Refresh URL to access Microsoft Azure Data Store.

Azure Data Lake Store Credentials

Yes

Required. Key to access Microsoft Azure Data Store.

File System Properties

Yes

Optional. A JSON string representing a map of properties needed for the distributed file system. 

File Output Format

Yes

Required. The format of output files. Must be ‘avro’, ‘text’ or ‘orc’. Default is text.

Field Delimiter (only when output format is text)

Yes

Optional. Delimiter to place between fields. Only used by the text output format. Defaults to tab.

Output Schema

Yes

Optional. Output schema of the JSON document. Required for avro and orc output formats. If left empty for text output format, the schema of input records will be used. This must be a subset of the schema of input records. Fields of type ARRAY, MAP, and RECORD are not supported with the text format. Fields of type UNION are only supported if they represent a nullable type.

Example

This example connects to Microsoft Azure Data Lake Store and writes files in avro format to specified path specified directory. This example uses Microsoft Azure Data Lake Store adls.azuredatalakestore.net, using the Azure Data Lake Store Client Id, Azure Data Lake Store Refresh Token URL, and Azure Data Lake Store Credentials:

Property

Value

Property

Value

Reference Name

ADLSBatchSink

Azure Data Lake Store Path

adl://adls.azuredatalakestore.net/adls/cdr/

Azure Data Lake Store Client Id

1016c0cb-aaaa-aaaa-aaaa-aaaaaaaaaaaa

Azure Data Lake Store Refresh Token URL

https://login.windows.net/5f3d9a6a-aaaa-aaaa-aaaa-aaaaaaaaaaaa/oauth2/token

Azure Data Lake Store Credentials

f1cF7CwFJKlMWXPzAAAA1XB7BErAAAAAAAAAAAAAAAA=

File Output Format

avro

Here is another example with the same configuration as the above example except the File Output Format is “text” and Field Delimiter is “,” :

Property

Value

Property

Value

Reference Name

ADLSBatchSink

Azure Data Lake Store Path

adl://adls.azuredatalakestore.net/adls/cdr/

Azure Data Lake Store Client Id

1016c0cb-aaaa-aaaa-aaaa-aaaaaaaaaaaa

Azure Data Lake Store Refresh Token URL

https://login.windows.net/5f3d9a6a-aaaa-aaaa-aaaa-aaaaaaaaaaaa/oauth2/token

Azure Data Lake Store Credentials

f1cF7CwFJKlMWXPzAAAA1XB7BErAAAAAAAAAAAAAAAA=

File Output Format

text

Field Delimiter

,



Created in 2020 by Google Inc.