Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Before installing the CDAP components, you must first install (or have access to) a Hadoop cluster with HBaseHDFSYARN, and ZooKeeperHive and Spark are optional components; Hive is required to enable CDAP's ad-hoc querying capabilities (CDAP Explore) and Spark is required if a CDAP application uses the Spark program.

All CDAP components can be installed on the same boxes as your Hadoop cluster, or on separate boxes that can connect to the Hadoop services.

CDAP depends on these services being present on the cluster. There are core dependencies, which must be running for CDAP system services to operate correctly, and optional dependencies, which may be required for certain functionality or program types.

The host(s) running the CDAP Master service must have the HBase, HDFS, and YARN clients installed, as CDAP uses the command line clients of these for initialization and their connectivity information for external service dependencies. If Hadoop system services are also running on the same hosts as the CDAP services, they will already have these clients installed.

Core Dependencies

  • HBase: For system runtime storage and queues

  • HDFS: The backing file system for distributed storage

  • YARN: For running system services in containers on cluster NodeManagers

  • MapReduce2: For batch operations in workflows and data exploration (included with YARN)

  • ZooKeeper: For service discovery and leader election

Optional Dependencies

  • Hive: For data exploration using SQL queries via the CDAP Explore system service

  • Spark: For running Spark programs within CDAP applications

Hadoop/HBase Environment

For a Distributed CDAP cluster, version 6.2.0, you must install these Hadoop components (see notes following the tables):

Component

Source

Supported Versions

Hadoop

various

2.0 and higher

HBase

Apache

0.98.x and 1.2

Cloudera Distribution of Apache Hadoop (CDH)

5.1 through 5.12 (Note 4)

Hortonworks Data Platform (HDP)

2.0 through 2.6 (Note 4)

Amazon Hadoop (EMR)

4.6 through 4.8 (with Apache HBase)

HDFS

Apache Hadoop

2.0.2-alpha through 2.6

Cloudera Distribution of Apache Hadoop (CDH)

5.1 through 5.12 (Note 4)

Hortonworks Data Platform (HDP)

2.0 through 2.6 (Note 4)

Amazon Hadoop (EMR)

4.6 through 4.8

YARN and MapReduce2

Apache Hadoop

2.0.2-alpha through 2.7

Cloudera Distribution of Apache Hadoop (CDH)

5.1 through 5.12 (Note 4)

Hortonworks Data Platform (HDP)

2.0 through 2.6 (Note 4)

Amazon Hadoop (EMR)

4.6 through 4.8

ZooKeeper

Apache

Version 3.4.3 through 3.4

Cloudera Distribution of Apache Hadoop (CDH)

5.1 through 5.12 (Note 4)

Hortonworks Data Platform (HDP)

2.0 through 2.6 (Note 4)

Amazon Hadoop (EMR)

4.6 through 4.8

For a Distributed CDAP cluster, version 6.2.0, you can (optionally) install these Hadoop components, as required:

Component

Source

Supported Versions

Hive

Apache

Version 0.12.0 through 1.2.x

Cloudera Distribution of Apache Hadoop (CDH)

5.1 through 5.12 (Note 4)

Hortonworks Data Platform (HDP)

2.0 through 2.6 (Note 4)

Amazon Hadoop (EMR)

4.6 through 4.8

Spark

Apache

Versions 1.6.x through 2.3.x

Cloudera Distribution of Apache Hadoop (CDH)

5.1 through 5.12 (Note 4)

Hortonworks Data Platform (HDP)

2.0 through 2.6.5 (Note 4)

Amazon Hadoop (EMR)

4.6 through 4.8

Note 1: Component versions shown in these tables are those that we have tested and are confident of their suitability and compatibility. Later versions of components may work, but have not necessarily been either tested or confirmed compatible.

Note 2: Certain CDAP components need to reference your HadoopYARNHBase, and Hive cluster configurations by adding those configurations to their class paths.

Note 3: Hive 0.12 is not supported for secure cluster configurations.

Note 4: An upcoming release of CDAP (scheduled for CDAP 4.3) will drop support for all versions older than CDH 5.4.11 or HDP 2.5.0.0 due to an Apache Hadoop Privilege Escalation Vulnerability.

  • No labels