Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

In this section, we list the specific hardwarememory, core, and network requirements, and the software prerequisites that need to be met and completed before installation of the CDAP components.

Complete the requirements and instructions below prior to installing the CDAP components.

Network Requirements

CDAP components communicate over your network with HBaseHDFS, and YARN. For the best performance, CDAP components should be located on the same LAN, ideally running at 1 Gbps or faster. A good rule of thumb is to treat CDAP components as you would Hadoop datanodes.

See the section below (CDAP and Firewalls) for information on configuring CDAP with a firewall and the listening ports that are used.

Software Prerequisites

You'll need this software installed:

  • Java runtime on each CDAP node and Hadoop datanode.

  • A Hadoop, HBase, Hive (and optionally Spark) environment to run against.

  • To use the ad-hoc querying capabilities of CDAP, ensure the cluster has a compatible version of Hive installed. See the section on Hadoop Compatibility.

  • If Hive is not going to be installed, you will need to disable the CDAP Explore Service, as by default it is enabled. The installation instructions describe how to configure this.

  • CDAP nodes require Hadoop and HBase client installation and configuration. Note: No Hadoop services need actually be running.

  • We recommend installing an NTP (Network Time Protocol) daemon on all nodes of the cluster, including those with CDAP components.

Java Runtime

The latest JDK or JRE version 1.8.xx for Linux, Windows, or Mac OS X must be installed in your environment; we recommend the Oracle JDK.

To check the Java version installed, run the command:

$ java -version

CDAP is tested with both the Oracle JDK and the Open JDK; it may work with other JDKs but it has not been tested with them.

Once you have installed the JDK, you'll need to set the JAVA_HOME environment variable.

NTP (Network Time Protocol)

Installing NTP on RPM using Yum

  1. Install the NTP service and dependencies:

    $ sudo yum install ntp ntpdate ntp-doc
    
  2. Set the service to start at reboot:

    $ sudo chkconfig ntpd on
    
  3. Start the NTP server. This will continuously adjust the system time from an upstream NTP server:

    $ sudo /etc/init.d/ntpd start
    
  4. Synchronize the system clock with the 0.pool.ntp.org server. You should use this command only once:

    $ sudo ntpdate -u pool.ntp.org
    
  5. Synchronize the hardware clock (to prevent synchronization problems), unless on a virtual server:

    $ sudo hwclock --systohc
    

Installing NTP on Debian using APT

  1. Install the NTP service and dependencies:

    $ sudo apt-get install ntp
    
  2. Start the NTP server. This will continuously adjust the system time from an upstream NTP server:

    $ sudo service ntp start
    
  3. Synchronize the system clock with the 0.pool.ntp.org server. You should use this command only once:

    $ sudo ntpdate -u pool.ntp.org
    
  4. Synchronize the hardware clock (to prevent synchronization problems), unless on a virtual server:

    $ sudo hwclock --systohc
    

NTP Troubleshooting and Configuration

  • To check the synchronization:

    $ ntpq -p
    
         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
    +173.44.32.10    18.26.4.105      2 u    5   64    1   78.786   -0.157   1.966
    *66.241.101.63   132.163.4.103    2 u    7   64    1   43.085    2.872   0.409
    +services.quadra 198.60.22.240    2 u    6   64    1   21.805    3.040   1.033
    -hydrogen.consta 200.98.196.212   2 u    7   64    1  114.250   16.011   0.873
    
  • If you need to adjust the configuration (add or delete servers, use servers closer to you, etc.):

    $ vi /etc/ntp.conf
    

CDAP and Firewalls

In general, your cluster configuration cannot have a firewall between the cluster and CDAP. Instead, if a firewall is used, the cluster and certain CDAP components need to be together behind the firewall. These are the ports which can be opened to provide external access:

Listen Ports for External Access

Description

Governing Configuration

Default Value in Packages/MapR

Default Value in Ambari/Cloudera Manager

CDAP Router listen port (HTTP RESTful)

router.bind.port

11015

11015

CDAP Router listen port (HTTP RESTful) (SSL)

router.ssl.bind.port

10443

10443

CDAP UI listen port

dashboard.bind.port

11011

11011

CDAP UI listen port (SSL)

dashboard.ssl.bind.port

9443

9443

CDAP Auth Server listen port

security.auth.server.bind.port

10009

10009

CDAP Auth Server listen port (SSL)

security.auth.server.ssl.bind.port

10010

10010

The exact configuration and ports required will vary depending on your use of firewalls and your specific configuration. This diagram shows a likely scenario that you could use:

In this diagram, we show the CDAP Router "traversing" the firewall. Note that the CDAP UI can be completely outside of the firewall, as it needs to talk to clients, the CDAP Router, and the CDAP Auth Server. These two services (Router and Auth Server) need to be accessible from the outside to users, but also must be able to connect to nodes within the cluster. They need unrestricted client access to the cluster with the ability to establish connections to cluster nodes, on any port that a container may choose to open.

Taking this same picture, if the firewall were moved to the left of the CDAP Router/Auth Server, then two ports (router.bind.port, 11015 and security.auth.server.bind.port, 10009) would need to be opened to allow access by clients to the hosts running the CDAP Router/Auth Server. There could be another firewall between the CDAP Router/Auth Server and the cluster, as long as it provides client access from the CDAP Auth Server to the ZooKeeper nodes. The same is true for the CDAP Router (access to the Zookeeper nodes), except it also needs unrestricted client access, so it usually doesn't make sense to firewall the CDAP Router when essentially you're allowing all traffic through.

As your configuration can vary from these descriptions, this information is intended to guide you in understanding what the different components require in order to successfully run CDAP rather than provide strict requirements.

  • No labels