Before installing the CDAP components, you must first install (or have access to) a Hadoop cluster with HBase, HDFS, YARN, and ZooKeeper. Hive and Spark are optional components; Hive is required to enable CDAP's ad-hoc querying capabilities (CDAP Explore) and Spark is required if a CDAP application uses the Spark program.
All CDAP components can be installed on the same boxes as your Hadoop cluster, or on separate boxes that can connect to the Hadoop services.
CDAP depends on these services being present on the cluster. There are core dependencies, which must be running for CDAP system services to operate correctly, and optional dependencies, which may be required for certain functionality or program types.
The host(s) running the CDAP Master service must have the HBase, HDFS, and YARN clients installed, as CDAP uses the command line clients of these for initialization and their connectivity information for external service dependencies. If Hadoop system services are also running on the same hosts as the CDAP services, they will already have these clients installed.
Core Dependencies
HBase: For system runtime storage and queues
HDFS: The backing file system for distributed storage
YARN: For running system services in containers on cluster NodeManagers
MapReduce2: For batch operations in workflows and data exploration (included with YARN)
ZooKeeper: For service discovery and leader election
Optional Dependencies
Hive: For data exploration using SQL queries via the CDAP Explore system service
Spark: For running Spark programs within CDAP applications
Hadoop/HBase Environment
For a Distributed CDAP cluster, version 6.2.0 and later, you must install these Hadoop components (see notes following the tables):
Component | Source | Supported Versions |
---|---|---|
Hadoop | various | 2.0 and higher |
HBase | Apache | 0.98.x and 1.2 |
Cloudera Distribution of Apache Hadoop (CDH) | 5.1 through 5.12 (Note 4) | |
Hortonworks Data Platform (HDP) | 2.0 through 2.6 (Note 4) | |
Amazon Hadoop (EMR) | 4.6 through 4.8 (with Apache HBase) | |
HDFS | Apache Hadoop | 2.0.2-alpha through 2.6 |
Cloudera Distribution of Apache Hadoop (CDH) | 5.1 through 5.12 (Note 4) | |
Hortonworks Data Platform (HDP) | 2.0 through 2.6 (Note 4) | |
Amazon Hadoop (EMR) | 4.6 through 4.8 | |
YARN and MapReduce2 | Apache Hadoop | 2.0.2-alpha through 2.7 |
Cloudera Distribution of Apache Hadoop (CDH) | 5.1 through 5.12 (Note 4) | |
Hortonworks Data Platform (HDP) | 2.0 through 2.6 (Note 4) | |
Amazon Hadoop (EMR) | 4.6 through 4.8 | |
ZooKeeper | Apache | Version 3.4.3 through 3.4 |
Cloudera Distribution of Apache Hadoop (CDH) | 5.1 through 5.12 (Note 4) | |
Hortonworks Data Platform (HDP) | 2.0 through 2.6 (Note 4) | |
Amazon Hadoop (EMR) | 4.6 through 4.8 |
For a Distributed CDAP cluster, version 6.2.0 and later, you can (optionally) install these Hadoop components, as required:
Component | Source | Supported Versions |
---|---|---|
Hive | Apache | Version 0.12.0 through 1.2.x |
Cloudera Distribution of Apache Hadoop (CDH) | 5.1 through 5.12 (Note 4) | |
Hortonworks Data Platform (HDP) | 2.0 through 2.6 (Note 4) | |
Amazon Hadoop (EMR) | 4.6 through 4.8 | |
Spark | Apache | Versions 1.6.x through 2.3.x |
Cloudera Distribution of Apache Hadoop (CDH) | 5.1 through 5.12 (Note 4) | |
Hortonworks Data Platform (HDP) | 2.0 through 2.6.5 (Note 4) | |
Amazon Hadoop (EMR) | 4.6 through 4.8 |
Note 1: Component versions shown in these tables are those that we have tested and are confident of their suitability and compatibility. Later versions of components may work, but have not necessarily been either tested or confirmed compatible.
Note 2: Certain CDAP components need to reference your Hadoop, YARN, HBase, and Hive cluster configurations by adding those configurations to their class paths.
Note 3: Hive 0.12 is not supported for secure cluster configurations.
Note 4: An upcoming release of CDAP (scheduled for CDAP 4.3) will drop support for all versions older than CDH 5.4.11 or HDP 2.5.0.0 due to an Apache Hadoop Privilege Escalation Vulnerability.