...
CDAP packages utilize a central configuration, stored by default in
/etc/cdap
.When you install the CDAP base package, a default configuration is placed in
/etc/cdap/conf.dist
. Thecdap-site.xml
file is a placeholder where you can define your specific configuration for all CDAP components. Thecdap-site.xml.example
file shows the properties that usually require customization for all installations.Similar to Hadoop, CDAP utilizes the
alternatives
framework to allow you to easily switch between multiple configurations. Thealternatives
system is used for ease of management and allows you to to choose between different directories to fulfill the same purpose.Simply copy the contents of
/etc/cdap/conf.dist
into a directory of your choice (such as/etc/cdap/conf.mycdap
) and make all of your customizations there. Then run thealternatives
command to point the/etc/cdap/conf
symlink to your custom directory/etc/cdap/conf.mycdap
:Code Block $ sudo cp -r /etc/cdap/conf.dist /etc/cdap/conf.mycdap $ sudo update-alternatives --install /etc/cdap/conf cdap-conf /etc/cdap/conf.mycdap 10
Configure the
cdap-site.xml
after you have installed the CDAP packages.To configure your particular installation, modify
cdap-site.xml
, usingcdap-site.xml.example
as a model. (See the appendix for a listing ofcdap-site.xml.example
, the minimalcdap-site.xml
file required.)Customize your configuration by creating (or editing if existing) an .xml file
conf/cdap-site.xml
and set appropriate properties:Code Block $ sudo cp -f /etc/cdap/conf.mycdap/cdap-site.xml.example /etc/cdap/conf.mycdap/cdap-site.xml $ sudo vi /etc/cdap/conf.mycdap/cdap-site.xml
If necessary, customize the file
cdap-env.sh
after you have installed the CDAP packages.Environment variables that will be included in the environment used when launching CDAP and can be set in the
cdap-env.sh
file, usually at/etc/cdap/conf/cdap-env.sh
.This is only necessary if you need to customize the environment launching CDAP, such as described below under Local Storage Configuration.
Depending on your installation, you may need to set these properties:
Check that the
zookeeper.quorum
property inconf/cdap-site.xml
is set to the ZooKeeper quorum string, a comma-delimited list of fully-qualified domain names for the ZooKeeper quorum:Code Block <property> <name>zookeeper.quorum</name> <value>FQDN1:2181,FQDN2:2181/${root.namespace}</value> <description> ZooKeeper quorum string; specifies the ZooKeeper host:port; substitute the quorum for the components shown here (FQDN1:2181,FQDN2:2181) </description> </property>
Check that the
router.server.address
property inconf/cdap-site.xml
is set to the hostname of the CDAP Router. The CDAP UI uses this property to connect to the Router:Code Block <property> <name>router.server.address</name> <value>{router-host-name}</value> <description>CDAP Router address to which CDAP UI connects</description> </property>
Check that there exists in HDFS a user directory for the
hdfs.user
property ofconf/cdap-site.xml
. By default, the HDFS user isyarn
. If necessary, create the directory:Code Block $ su hdfs $ hadoop fs -mkdir -p /user/yarn && hadoop fs -chown yarn:yarn /user/yarn
If you want to use an HDFS directory with a name other than
/cdap
:Create the HDFS directory you want to use, such as
/myhadoop/myspace
.Create an
hdfs.namespace
property for the HDFS directory inconf/cdap-site.xml
:Code Block <property> <name>hdfs.namespace</name> <value>/myhadoop/myspace</value> <description>Default HDFS namespace</description> </property>
Check that the default HDFS user
yarn
owns that HDFS directory.
If you want to use an HDFS user other than
yarn
, such asmy_username
:Check that there is—and create if necessary—a corresponding user on all machines in the cluster on which YARN is running (typically, all of the machines).
Create an
hdfs.user
property for that user inconf/cdap-site.xml
:Code Block <property> <name>hdfs.user</name> <value>my_username</value> <description>User for accessing HDFS</description> </property>
Check that the HDFS user owns the HDFS directory described by
hdfs.namespace
on all machines.Check that there exists in HDFS a
/user/
directory for that HDFS user, as described above, such as:Code Block $ su hdfs $ hadoop fs -mkdir -p /user/my_username && hadoop fs -chown my_username:my_username /user/my_username
If you use an HDFS user other than
yarn
, you must use either a secure cluster or use the LinuxContainerExecutor instead of theDefaultContainerExecutor
. (Because of howDefaultContainerExecutor
works, other containers will launch asyarn
rather than the specifiedhdfs.user
.) On Kerberos-enabled clusters, you must useLinuxContainerExecutor
as theDefaultContainerExecutor
will not work correctly.
To use the ad-hoc querying capabilities of CDAP, ensure the cluster has a compatible version of Hive installed. See the section on Hadoop Compatibility. To use this feature on secure Hadoop clusters, please see the instructions on configuring secure Hadoop.
Note: Some versions of Hive contain a bug that may prevent the CDAP Explore Service from starting up. See CDAP-1865 for more information about the issue. If the CDAP Explore Service fails to start and you see a
javax.jdo.JDODataStoreException: Communications link failure
in the log, try adding this property to the Hivehive-site.xml
file:Code Block <property> <name>datanucleus.connectionPoolingType</name> <value>DBCP</value> </property>
If Hive is not going to be installed, disable the CDAP Explore Service in
conf/cdap-site.xml
(by default, it is enabled):Code Block <property> <name>explore.enabled</name> <value>false</value> <description>Enable Explore functionality</description> </property>
If you'd like to publish metadata updates to an external Apache Kafka instance, CDAP has the capability of publishing notifications upon metadata updates. For details on the configuration settings and an example output, see Audit logging.
...