In this section, we list the specific hardware, memory, core, and network requirements, and the software prerequisites that need to be met and completed before installation of the CDAP components.
Complete the requirements and instructions below prior to installing the CDAP components.
Software Prerequisites
You'll need this software installed:
A Java runtime on each CDAP node and Hadoop datanode.
A Hadoop, HBase, Hive (and optionally Spark) environment to run against.
To use the ad-hoc querying capabilities of CDAP, ensure the cluster has a compatible version of Hive installed. See the section on Hadoop Compatibility.
If Hive is not going to be installed, you will need to disable the CDAP Explore Service, as by default it is enabled. The installation instructions describe how to configure this.
CDAP nodes require Hadoop and HBase client installation and configuration. Note: No Hadoop services need actually be running.
We recommend installing an NTP (Network Time Protocol) daemon on all nodes of the cluster, including those with CDAP components.
Java Runtime
The latest JDK or JRE version 1.8.xx for Linux, Windows, or Mac OS X must be installed in your environment; we recommend the Oracle JDK.
To check the Java version installed, run the command:
$ java -version
CDAP is tested with both the Oracle JDK and the Open JDK; it may work with other JDKs but it has not been tested with them.
Once you have installed the JDK, you'll need to set the JAVA_HOME environment variable.
NTP (Network Time Protocol)
We recommend installing an NTP (Network Time Protocol) daemon on all nodes of the cluster, including those with CDAP components.
NTP requires that port 123 be open.
If your cluster does not have access to the internet, you can run a local version of NTP by setting up a master node as an NTP server.
Installing NTP on RPM using Yum
Install the NTP service and dependencies:
$ sudo yum install ntp ntpdate ntp-doc
Set the service to start at reboot:
$ sudo chkconfig ntpd on
Start the NTP server. This will continuously adjust the system time from an upstream NTP server:
$ sudo /etc/init.d/ntpd start
Synchronize the system clock with the
0.pool.ntp.org
server. You should use this command only once:$ sudo ntpdate -u pool.ntp.org
Synchronize the hardware clock (to prevent synchronization problems), unless on a virtual server:
$ sudo hwclock --systohc
Installing NTP on Debian using APT
Install the NTP service and dependencies:
$ sudo apt-get install ntp
Start the NTP server. This will continuously adjust the system time from an upstream NTP server:
$ sudo service ntp start
Synchronize the system clock with the
0.pool.ntp.org
server. You should use this command only once:$ sudo ntpdate -u pool.ntp.org
Synchronize the hardware clock (to prevent synchronization problems), unless on a virtual server:
$ sudo hwclock --systohc
NTP Troubleshooting and Configuration
To check the synchronization:
$ ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== +173.44.32.10 18.26.4.105 2 u 5 64 1 78.786 -0.157 1.966 *66.241.101.63 132.163.4.103 2 u 7 64 1 43.085 2.872 0.409 +services.quadra 198.60.22.240 2 u 6 64 1 21.805 3.040 1.033 -hydrogen.consta 200.98.196.212 2 u 7 64 1 114.250 16.011 0.873
If you need to adjust the configuration (add or delete servers, use servers closer to you, etc.):
$ vi /etc/ntp.conf
CDAP and Firewalls
In general, your cluster configuration cannot have a firewall between the cluster and CDAP. Instead, if a firewall is used, the cluster and certain CDAP components need to be together behind the firewall. These are the ports which can be opened to provide external access:
Listen Ports for External Access
Description | Governing Configuration | Default Value in Packages/MapR | Default Value in Ambari/Cloudera Manager |
---|---|---|---|
CDAP Router listen port (HTTP RESTful) |
| 11015 | 11015 |
CDAP Router listen port (HTTP RESTful) (SSL) |
| 10443 | 10443 |
CDAP UI listen port |
| 11011 | 11011 |
CDAP UI listen port (SSL) |
| 9443 | 9443 |
CDAP Auth Server listen port |
| 10009 | 10009 |
CDAP Auth Server listen port (SSL) |
| 10010 | 10010 |
The exact configuration and ports required will vary depending on your use of firewalls and your specific configuration. This diagram shows a likely scenario that you could use:
In this diagram, we show the CDAP Router "traversing" the firewall. Note that the CDAP UI can be completely outside of the firewall, as it needs to talk to clients, the CDAP Router, and the CDAP Auth Server. These two services (Router and Auth Server) need to be accessible from the outside to users, but also must be able to connect to nodes within the cluster. They need unrestricted client access to the cluster with the ability to establish connections to cluster nodes, on any port that a container may choose to open.
Taking this same picture, if the firewall were moved to the left of the CDAP Router/Auth Server, then two ports (router.bind.port
, 11015 and security.auth.server.bind.port
, 10009) would need to be opened to allow access by clients to the hosts running the CDAP Router/Auth Server. There could be another firewall between the CDAP Router/Auth Server and the cluster, as long as it provides client access from the CDAP Auth Server to the ZooKeeper nodes. The same is true for the CDAP Router (access to the Zookeeper nodes), except it also needs unrestricted client access, so it usually doesn't make sense to firewall the CDAP Router when essentially you're allowing all traffic through.
As your configuration can vary from these descriptions, this information is intended to guide you in understanding what the different components require in order to successfully run CDAP rather than provide strict requirements.