Hadoop Compatibility

Before installing the CDAP components, you must first install (or have access to) a Hadoop cluster with HBase, HDFS, Spark, YARN, and ZooKeeper. 

All CDAP components can be installed on the same boxes as your Hadoop cluster, or on separate boxes that can connect to the Hadoop services.

CDAP depends on these services being present on the cluster. There are core dependencies, which must be running for CDAP system services to operate correctly.

The host(s) running the CDAP Master service must have the HBase, HDFS, and YARN clients installed, as CDAP uses the command line clients of these for initialization and their connectivity information for external service dependencies. If Hadoop system services are also running on the same hosts as the CDAP services, they will already have these clients installed.

Core Dependencies

  • HBase: For system runtime storage and queues

  • HDFS: The backing file system for distributed storage

  • Spark: For running Spark programs within CDAP applications

  • YARN: For running system services in containers on cluster NodeManagers

  • MapReduce2: For batch operations in workflows and data exploration (included with YARN)

  • ZooKeeper: For service discovery and leader election

 

Hadoop/HBase Environment

For a Distributed CDAP cluster, version 6.2.0 and later, you must install these Hadoop components (see notes following the tables):

Component

Source

Supported Versions

Component

Source

Supported Versions

Hadoop

various

2.6.5 and higher

HBase

Apache

0.98.x and 1.2

Amazon Hadoop (EMR)

4.6 through 4.8 (with Apache HBase)

HDFS

Apache Hadoop

2.0.2-alpha through 2.6

Amazon Hadoop (EMR)

4.6 through 4.8

Spark

Apache

Versions 2.4+ running on Scala 2.12

Amazon Hadoop (EMR)

4.6 through 4.8

YARN and MapReduce2

Apache Hadoop

2.0.2-alpha through 2.7

Amazon Hadoop (EMR)

4.6 through 4.8

ZooKeeper

Apache

Version 3.4.3 through 3.4

Amazon Hadoop (EMR)

4.6 through 4.8

Note 1: Component versions shown in these tables are those that we have tested and are confident of their suitability and compatibility. Later versions of components may work, but have not necessarily been either tested or confirmed compatible.

Note 2: Certain CDAP components need to reference your Hadoop, YARN, and HBase cluster configurations by adding those configurations to their class paths.

 

Created in 2020 by Google Inc.