Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. CDAP packages utilize a central configuration, stored by default in /etc/cdap.

    When you install the CDAP base package, a default configuration is placed in /etc/cdap/conf.dist. The cdap-site.xml file is a placeholder where you can define your specific configuration for all CDAP components. The cdap-site.xml.example file shows the properties that usually require customization for all installations.

    Similar to Hadoop, CDAP utilizes the alternatives framework to allow you to easily switch between multiple configurations. The alternatives system is used for ease of management and allows you to to choose between different directories to fulfill the same purpose.

    Simply copy the contents of /etc/cdap/conf.dist into a directory of your choice (such as /etc/cdap/conf.mycdap) and make all of your customizations there. Then run the alternatives command to point the /etc/cdap/conf symlink to your custom directory /etc/cdap/conf.mycdap:

    Code Block
    $ sudo cp -r /etc/cdap/conf.dist /etc/cdap/conf.mycdap
    $ sudo update-alternatives --install /etc/cdap/conf cdap-conf /etc/cdap/conf.mycdap 10
  2. Configure the cdap-site.xml after you have installed the CDAP packages.

    To configure your particular installation, modify cdap-site.xml, using cdap-site.xml.example as a model. (See the appendix for a listing of cdap-site.xml.example, the minimal cdap-site.xml file required.)

    Customize your configuration by creating (or editing if existing) an .xml file conf/cdap-site.xml and set appropriate properties:

    Code Block
    $ sudo cp -f /etc/cdap/conf.mycdap/cdap-site.xml.example /etc/cdap/conf.mycdap/cdap-site.xml
    $ sudo vi /etc/cdap/conf.mycdap/cdap-site.xml
  3. If necessary, customize the file cdap-env.sh after you have installed the CDAP packages.

    Environment variables that will be included in the environment used when launching CDAP and can be set in the cdap-env.sh file, usually at /etc/cdap/conf/cdap-env.sh.

    This is only necessary if you need to customize the environment launching CDAP, such as described below under Local Storage Configuration.

  4. Depending on your installation, you may need to set these properties:

    1. Check that the zookeeper.quorum property in conf/cdap-site.xml is set to the ZooKeeper quorum string, a comma-delimited list of fully-qualified domain names for the ZooKeeper quorum:

      Code Block
      <property>
        <name>zookeeper.quorum</name>
        <value>FQDN1:2181,FQDN2:2181/${root.namespace}</value>
        <description>
          ZooKeeper quorum string; specifies the ZooKeeper host:port;
          substitute the quorum for the components shown here (FQDN1:2181,FQDN2:2181)
        </description>
      </property>
    2. Check that the router.server.address property in conf/cdap-site.xml is set to the hostname of the CDAP Router. The CDAP UI uses this property to connect to the Router:

      Code Block
      <property>
        <name>router.server.address</name>
        <value>{router-host-name}</value>
        <description>CDAP Router address to which CDAP UI connects</description>
      </property>
    3. Check that there exists in HDFS a user directory for the hdfs.user property of conf/cdap-site.xml. By default, the HDFS user is yarn. If necessary, create the directory:

      Code Block
      $ su hdfs
      $ hadoop fs -mkdir -p /user/yarn && hadoop fs -chown yarn:yarn /user/yarn
    4. If you want to use an HDFS directory with a name other than /cdap:

      1. Create the HDFS directory you want to use, such as /myhadoop/myspace.

      2. Create an hdfs.namespace property for the HDFS directory in conf/cdap-site.xml:

        Code Block
        <property>
          <name>hdfs.namespace</name>
          <value>/myhadoop/myspace</value>
          <description>Default HDFS namespace</description>
        </property>
      3. Check that the default HDFS user yarn owns that HDFS directory.

    5. If you want to use an HDFS user other than yarn, such as my_username:

      1. Check that there is—and create if necessary—a corresponding user on all machines in the cluster on which YARN is running (typically, all of the machines).

      2. Create an hdfs.user property for that user in conf/cdap-site.xml:

        Code Block
        <property>
          <name>hdfs.user</name>
          <value>my_username</value>
          <description>User for accessing HDFS</description>
        </property>
      3. Check that the HDFS user owns the HDFS directory described by hdfs.namespace on all machines.

      4. Check that there exists in HDFS a /user/ directory for that HDFS user, as described above, such as:

        Code Block
        $ su hdfs
        $ hadoop fs -mkdir -p /user/my_username && hadoop fs -chown my_username:my_username /user/my_username
      5. If you use an HDFS user other than yarn, you must use either a secure cluster or use the LinuxContainerExecutor instead of the DefaultContainerExecutor. (Because of how DefaultContainerExecutor works, other containers will launch as yarn rather than the specified hdfs.user.) On Kerberos-enabled clusters, you must use LinuxContainerExecutor as the DefaultContainerExecutor will not work correctly.

    6. To use the ad-hoc querying capabilities of CDAP, ensure the cluster has a compatible version of Hive installed. See the section on Hadoop Compatibility. To use this feature on secure Hadoop clusters, please see the instructions on configuring secure Hadoop.

      Note: Some versions of Hive contain a bug that may prevent the CDAP Explore Service from starting up. See CDAP-1865 for more information about the issue. If the CDAP Explore Service fails to start and you see a javax.jdo.JDODataStoreException: Communications link failure in the log, try adding this property to the Hive hive-site.xml file:

      Code Block
      <property>
        <name>datanucleus.connectionPoolingType</name>
        <value>DBCP</value>
      </property>
    7. If Hive is not going to be installed, disable the CDAP Explore Service in conf/cdap-site.xml (by default, it is enabled):

      Code Block
      <property>
        <name>explore.enabled</name>
        <value>false</value>
        <description>Enable Explore functionality</description>
      </property>
    8. If you'd like to publish metadata updates to an external Apache Kafka instance, CDAP has the capability of publishing notifications upon metadata updates. For details on the configuration settings and an example output, see Audit logging.

...