Introduction
An option of IAM role based authentication and server side encryption in the existing S3 source and sink plugins.
Use case(s)
- In the S3 source and S3 sink(Avro and Parquet) plugins,there should be a provision for user to select authentication mechanism for S3.User should have an option to select IAM role based authentication in the plugins.
- In the S3 source and S3 sink(Avro and Parquet) plugins,there should be a provision for user to enable server side encryption on S3.
User Storie(s)
- As a pipeline user,i want to have an option of IAM role based authentication in the S3 source and sink plugins in Hydrator.
- As a pipeline user,i want access ID and access key to be mandatory for Access Credentials authentication method.
- As a pipeline user,i want to have an option for enabling server side encryption in S3 source and sink plugins in Hydrator.
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
New Configuration would be added in the S3 plugin
User Facing Name | Type | Description | Constraints |
---|
Authentication Method | Select | Authentication method to access S3. Defaults to Access Credentials. User need to have AWS environment only to use IAM role based authentication.Non-EC2 environment can not be used. For IAM, URI scheme should be s3a://. (Macro-enabled) | |
Server Side Encryption | Select | Server side encryption. Defaults to True. | |
Design / Implementation Tips
Design
Authentication:
{
"widget-type": "select",
"label": "Authentication Method",
"name": "authenticationMethod",
"widget-attributes": {
"values": [
"Access Credentials",
"IAM"
],
"default": "Access Credentials"
}
}
Server side encryption:
{
"widget-type": "select",
"label": "Server Side Encryption",
"name": "enableEncryption",
"widget-attributes": {
"values": [
"True",
"False"
],
"default": "True"
}
},
Approach(s)
1.When user selected IAM role based authentication method,need to omit the properties related to keys.
2.When user selects IAM based authentication and enables server side encryption,then fs.s3a.server-side-encryption-algorithm would be set to AES256(This is the only supported value.)
3.When user selects Access Credentials authentication and enables server side encryption,then fs.s3n.server-side-encryption-algorithm would be set to AES256(This is the only supported value.)
References:
https://issues.apache.org/jira/browse/HADOOP-10568
https://hortonworks.github.io/hdp-aws/s3-encryption/index.html
https://issues.apache.org/jira/browse/HADOOP-13131
Properties
Security
Limitation(s)
1.For all the S3 plugins, S3 regions which are supporting both the signature versions(Version 2 and Version 4) are only supported.
2.User need to have AWS environment only to use IAM role based authentication.Non-EC2 environment can not be used.
3.User would have to use s3a hadoop client only to use IAM authentication.(URI scheme: s3a://)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- S3batch source with IAM role based authentication
- S3batchsource with key credentials
- S3Avrosink with IAM role based authentication
- S3AvroSink with key credentials
- S3ParquetSink with IAM role based authentication
- S3ParquetSink with key credentials
Sample Pipeline
S3SourceIAM.json
S3SourceCredentials.json
S3SourceCredentials.jsonS3SinkAvroIAM-cdap-data-pipeline.json
S3SinkAvroCredentials-cdap-data-pipeline.json
S3SinkParquetIAM_1-cdap-data-pipeline.json
S3SinkParquetCredentials1-cdap-data-pipeline.json