Installation on Amazon EMR using Bootstrap Actions
This section describes installing CDAP on Amazon EMR clusters using the Amazon EMR "Run If" Bootstrap Action to:
Install necessary EMR components;
Restrict CDAP installation to the EMR master node;
Download, install, and automatically configure CDAP for EMR; and
Run all services as theÂ
'cdap'
 user
Information on Amazon EMR is available online.
CDAP 6.2 is compatible with Amazon EMR 4.6.0 through 5.3.1.
Using the Create Cluster Wizard
For any settings not listed or specified below, we recommend using the default settings.
Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.
Choose "Create cluster."
In the Advanced Options, Step 1: Software and Steps, set:
Vendor: Amazon
Release:Â
emr-4.6.0
 throughÂemr-5.3.1
Software: Hadoop, HBase, Spark
No auto-terminate
EMR Create Cluster Wizard: Step 1: Software and StepsIn Step 2: Hardware, set:
Network: use defaults
EC2 Subnet: use defaults
Master
EC2 Instance type:Â
m3.xlarge
Instance count: 1
Core
EC2 Instance type:Â
m3.xlarge
Instance count: 4 (as a minimum)
Task
Instance count: 0 (not required)
EMR Create Cluster Wizard:Â Step 2: Hardware
In Step 3: General Cluster Settings, set:
Logging
Debugging
Termination protection (no auto-terminate)
EMR Create Cluster Wizard:Â Step 3: General Cluster Settings
In Step 3: General Cluster Settings, add a Bootstrap Action:
Type:Â Run If
Optional arguments:
instance.isMaster=true "curl https://downloads.cdap.io/emr/install-6.0.0.sh | sudo bash -s"
Â
EMR Create Cluster Wizard:Â Add Bootstrap Action
In Step 4: Security, set following defaults, and then add a security group (next step).
EMR Create Cluster Wizard:Â Step 4: Security
In Step 4: Security, set additional EC2 Security Groups to the master node:
Master (one of the following):
A Security Group with ports 11011/11015 open;Â or
An SSH Tunnel
EMR Create Cluster Wizard:Â Assigning additional security group to master node
Once the cluster is created, CDAP services will start up. This will take about 10 minutes after the cluster is in a Waiting state.
Created in 2020 by Google Inc.