Hadoop Compatibility
Before installing the CDAP components, you must first install (or have access to) a Hadoop cluster with HBase, HDFS, Spark, YARN, and ZooKeeper.Â
All CDAP components can be installed on the same boxes as your Hadoop cluster, or on separate boxes that can connect to the Hadoop services.
CDAP depends on these services being present on the cluster. There are core dependencies, which must be running for CDAP system services to operate correctly.
The host(s) running the CDAP Master service must have the HBase, HDFS, and YARN clients installed, as CDAP uses the command line clients of these for initialization and their connectivity information for external service dependencies. If Hadoop system services are also running on the same hosts as the CDAP services, they will already have these clients installed.
Core Dependencies
HBase:Â For system runtime storage and queues
HDFS:Â The backing file system for distributed storage
Spark:Â For running Spark programs within CDAP applications
YARN:Â For running system services in containers on cluster NodeManagers
MapReduce2:Â For batch operations in workflows and data exploration (included with YARN)
ZooKeeper:Â For service discovery and leader election
Â
Hadoop/HBase Environment
For a Distributed CDAP cluster, version 6.2.0 and later, you must install these Hadoop components (see notes following the tables):
Component | Source | Supported Versions |
---|---|---|
Hadoop | various | 2.6.5 and higher |
HBase | Apache | 0.98.x and 1.2 |
Amazon Hadoop (EMR) | 4.6 through 4.8 (with Apache HBase) | |
HDFS | Apache Hadoop | 2.0.2-alpha through 2.6 |
Amazon Hadoop (EMR) | 4.6 through 4.8 | |
Spark | Apache | Versions 2.4+ running on Scala 2.12 |
Amazon Hadoop (EMR) | 4.6 through 4.8 | |
YARN and MapReduce2 | Apache Hadoop | 2.0.2-alpha through 2.7 |
Amazon Hadoop (EMR) | 4.6 through 4.8 | |
ZooKeeper | Apache | Version 3.4.3 through 3.4 |
Amazon Hadoop (EMR) | 4.6 through 4.8 |
Note 1:Â Component versions shown in these tables are those that we have tested and are confident of their suitability and compatibility. Later versions of components may work, but have not necessarily been either tested or confirmed compatible.
Note 2: Certain CDAP components need to reference your Hadoop, YARN, and HBase cluster configurations by adding those configurations to their class paths.
Â
Created in 2020 by Google Inc.