Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Before installing the CDAP components, you must first install (or have access to) a Hadoop cluster with HBaseHDFSYARN, and ZooKeeperwith HBase, HDFS, Spark, YARN, and ZooKeeperHive and Spark are optional components is an optional component; Hive is required to enable CDAP's ad-hoc querying capabilities (CDAP Explore) and Spark is required if a CDAP application uses the Spark program.

All CDAP components can be installed on the same boxes as your Hadoop cluster, or on separate boxes that can connect to the Hadoop services.

...

  • HBase: For system runtime storage and queues

  • HDFS: The backing file system for distributed storage

  • Spark: For running Spark programs within CDAP applications

  • YARN: For running system services in containers on cluster NodeManagers

  • MapReduce2: For batch operations in workflows and data exploration (included with YARN)

  • ZooKeeper: For service discovery and leader election

...

  • Hive: For data exploration using SQL queries via the CDAP Explore system service

  • Spark: For running Spark programs within CDAP applications

Hadoop/HBase Environment

For a Distributed CDAP cluster, version 6.2.0 and later, you must install these Hadoop components (see notes following the tables):

Component

Source

Supported Versions

Hadoop

various

2.0 and higher

HBase

Apache

0.98.x and 1.2

Amazon Hadoop (EMR)

4.6 through 4.8 (with Apache HBase)

HDFS

Apache Hadoop

2.0.2-alpha through 2.6

Amazon Hadoop (EMR)

4.6 through 4.8

Spark

Apache

Versions 2.4+ running on Scala 2.12

Amazon Hadoop (EMR)

4.6 through 4.8

YARN and MapReduce2

Apache Hadoop

2.0.2-alpha through 2.7

Amazon Hadoop (EMR)

4.6 through 4.8

ZooKeeper

Apache

Version 3.4.3 through 3.4

Amazon Hadoop (EMR)

4.6 through 4.8

...