Page Comparison

Before installing the CDAP components, you must first install (or have access to) a Hadoop cluster with HBase, HDFS, YARN, and ZooKeeperwith HBase, HDFS, Spark, YARN, and ZooKeeper. Hive and Spark are optional components is an optional component; Hive is required to enable CDAP's ad-hoc querying capabilities (CDAP Explore) and Spark is required if a CDAP application uses the Spark program.

All CDAP components can be installed on the same boxes as your Hadoop cluster, or on separate boxes that can connect to the Hadoop services.

...

HBase: For system runtime storage and queues
HDFS: The backing file system for distributed storage
Spark: For running Spark programs within CDAP applications
YARN: For running system services in containers on cluster NodeManagers
MapReduce2: For batch operations in workflows and data exploration (included with YARN)
ZooKeeper: For service discovery and leader election

...

Hive: For data exploration using SQL queries via the CDAP Explore system service
Spark: For running Spark programs within CDAP applications

Hadoop/HBase Environment

For a Distributed CDAP cluster, version 6.2.0 and later, you must install these Hadoop components (see notes following the tables):

Component	Source	Supported Versions
Hadoop	various	2.0 and higher
HBase	Apache	0.98.x and 1.2
HBase	Amazon Hadoop (EMR)	4.6 through 4.8 (with Apache HBase)
HDFS	Apache Hadoop	2.0.2-alpha through 2.6
HDFS	Amazon Hadoop (EMR)	4.6 through 4.8
Spark	Apache	Versions 2.4+ running on Scala 2.12
Spark	Amazon Hadoop (EMR)	4.6 through 4.8
YARN and MapReduce2	Apache Hadoop	2.0.2-alpha through 2.7
YARN and MapReduce2	Amazon Hadoop (EMR)	4.6 through 4.8
ZooKeeper	Apache	Version 3.4.3 through 3.4
ZooKeeper	Amazon Hadoop (EMR)	4.6 through 4.8

...

Versions Compared

Old Version 6

New Version 7

Key