What are the memory and core requirements for CDAP?

The requirements are governed by two sources: CDAP and YARN, and the requirements are described here.

How do I set the CDAP properties for components running on multiple machines?

In the configuration file cdap-site.xml, there are numerous properties that specify an IP address where a service is running, such as router.server.addressdata.tx.bind.addressapp.bind.addressrouter.bind.address.

Our convention is that:

For *.bind.address properties, it is often easiest just to set these to '0.0.0.0' to listen on all interfaces.

The *.server.* properties are used by clients to connect to another remote service. The only one you should need to configure initially is router.server.address, which is used by the UI to connect to the router. As an example, ideally routers running in production would have a load balancer in front, which is what you would set router.server.address to. Alternatively, you could configure each UI instance to point to a particular router, and if you have both UI and router running on each node, you could use '127.0.0.1'.

How do I use YARN with the Linux Container Executor?

If you have YARN configured to use LinuxContainerExecutor (see the setting for yarn.nodemanager.container-executor.class):

Where is the CDAP CLI (Command Line Interface) in Distributed mode?

If you've installed the cdap-cli RPM or DEB, it's located under /opt/cdap/cli/bin. If you have installed CDAP manually, you can add this location to your PATH to prevent the need for specifying the entire script every time.

note

Note: These commands will list the contents of the package cdap-cli, once it has been installed:

Note: These commands will list the contents of the package cdap-cli, once it has been installed:

$ rpm -ql cdap-cli
$ dpkg -L cdap-cli

I've followed the install instructions, yet CDAP does not start. What next?

If you have followed the installation instructions, and CDAP either did not startup, check:

The CDAP UI is showing a message "namespace cannot be found".

This is indicative that the UI cannot connect to the CDAP system service containers running in YARN.

I don't see the CDAP Master service on YARN.

YARN Application shows ACCEPTED for some time but then fails.

It's possible that YARN can't extract the .JARs to the /tmp, either due to a lack of disk space or permissions.

The CDAP Master log shows permissions issues.

Ensure that hdfs:///${hdfs.namespace} and hdfs:///user/${hdfs.user} exist and are owned by ${hdfs.user}. (hdfs.namespace and hdfs.user are defined in your installation's cdap-site.xml file.)

In any other cases, the error should show which directory it is attempting to access, such as:

2015-10-30 22:14:27,528 - ERROR [ STARTING:...MasterServiceMain$2@452] - master.services failed with exception; restarting with back-off
java.lang.RuntimeException: java.io.IOException: failed to copy bundle from file:/tmp/appMaster.37a86cfd....jar5052.tmp
to hdfs://nameservice/cdap/twill/master.services/b4ce41a5e7e5.../appMaster.37a86cfd-1d88.jar
      at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
      ...
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=yarn, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)

or:

Deploy failed: Could not create temporary directory at: /var/tmp/cdap/data/namespaces/phoenix/tmp

Don't hesitate to ask for help at the cdap-user@googlegroups.com.

The CDAP Master log shows an error about the dataset service not being found.

If you see an error such as:

2015-05-15 12:15:53,028 - ERROR [heartbeats-scheduler:c.c.c.d.s.s.MDSStreamMetaStore$1@71]
- Failed to access app.meta table io.cdap.cdap.api.dataset.DatasetManagementException:
Cannot retrieve dataset instance app.meta info, details: Response code: 407,
message:'Proxy Authentication Required',
body: '<HTML><HEAD> <TITLE>Access Denied</TITLE> </HEAD>

According to that log, this error can be caused by a proxy setting. CDAP services internally makes HTTP requests to each other; one example is the dataset service. Depending on your proxy and its settings, these requests can end up being sent to the proxy instead.

One item to check is that your system's network setting is configured to exclude both localhost and 127.0.0.1 from the proxy routing. If they aren't, the services will not be able to communicate with each other, and you'll see error messages such as these. You can set a system's network setting for a proxy by using:

$ export no_proxy="localhost,127.0.0.1"

CDAP services on Distributed CDAP aren't starting up due to an exception. What should I do?

If the CDAP services on a Distributed CDAP installation wouldn't start up due to a java.lang.ClassNotFoundException, you will see errors in the logs. You will find in the logs for cdap-master under /var/log/cdap/master*.log errors such as these:

"Exception in thread "main" java.lang.NoClassDefFoundError:
  io.cdap.cdap.data.runtime.main.MasterServiceMain
    at gnu.java.lang.MainThread.run(libgcj.so.10)"

Things to check as possible solutions:

  1. Check if the JDK being used is supported by CDAP:

    $ java -version
  2. Check if the CDAP user is using a correct version of the JDK:

    $ sudo su - <cdap-user>
    $ java -version
  3. Run this command to see if all the CDAP classpaths are included:

    $ /opt/cdap/master/bin/cdap classpath | tr ':' '\n'

    Expect to see (where <version> is the appropriate hbase-compat version):

    /etc/cdap/conf/
    /opt/cdap/hbase-compat-<version>/lib/*
    /opt/cdap/master/conf/
    /opt/cdap/master/lib/*

    If the classpath is incorrect (in general, the hbase-compat-<version>/lib/* and cdap/master/lib/* entries must precede any other paths that contain JARs or classes such as the HBase classpath), review the installation instructions and correct.

We aren't seeing any Metrics or Logs. What should we do?

Check that:

In a two-box HA setup with a replication factor of two, if one box fails to startup, metrics will not show up though the application will still run:

[2013-10-10 20:48:46,160] ERROR [KafkaApi-1511941310]
      Error while retrieving topic metadata (kafka.server.KafkaApis)
      kafka.admin.AdministrationException:
             replication factor: 2 larger than available brokers: 1

As a last resort, cdap_kafka can be reset by stopping CDAP (including cdap_kafka), removing the Kafka znode (from within ZooKeeper use rmr /${cdap.namespace}/kafka) and restarting CDAP.

Log Saver Process throws an Out-of-Memory Error; the CDAP UI shows service "Not OK"

The CDAP Log Saver uses an internal buffer that may overflow and result in Out-of-Memory Errors when applications create excessive amounts of logs. One symptom of this is that the CDAP UI Services Explorer shows the log.saver service as not OK, in addition to seeing error messages in the logs.

By default, the Log Saver process is limited to 1GB of memory and the buffer keeps eight buckets of events in-memory. Each event bucket contains logs generated for one second. When it is expected that logs exceeding these settings will be produced—for example, greater than 1GB of logs generated in eight seconds—increase the memory allocated to the Log Saver or increase the number of Log Saver instances. If the cluster has limited memory or containers available, you can choose instead to decrease the number of in-memory event buckets. However, decreasing the number of in-memory buckets may lead to out-of-order log events.

In the cdap-site.xml, you can:

See the log.saver parameter in the cdap-site.xml documentation for a list of these configuration parameters and their values that can be adjusted.

Upgrading CDAP

Can a CDAP installation be upgraded more than one version?

In general, no. (The exception is an upgrade from 2.8.x to 3.0.x.) This table lists the upgrade paths available for different CDAP versions:

Version

Upgrade Directly To

3.2.x

3.3.x

3.1.x

3.2.x

3.0.x

3.1.x

2.8.x

3.0.x

2.6.3

2.8.2

If you are doing a new installation, we recommend using the current version of CDAP.

I missed doing a step while upgrading; how do I fix my system?

If you miss a step in the upgrade process and something goes wrong, it's possible that the tables will get re-enabled before the coprocessors are upgraded. This could cause the regionservers to abort and may make it very difficult to get the cluster back to a stable state where the tables can be disabled again and complete the upgrade process.

In that case, set this configuration property in hbase-site.xml:

<property>
  <name>hbase.coprocessor.abortonerror</name>
  <value>false</value>
</property>

and restart the HBase regionservers. This will allow the regionservers to start up despite the coprocessor version mismatch. At this point, you should be able to run through the upgrade steps successfully.

At the end, remove the entry for hbase.coprocessor.abortonerror in order to ensure that data correctness is maintained.

Ask the CDAP Community for assistance

You can post a question at the cdap-user@googlegroups.com.

The cdap-user mailing list is primarily for users using the product to develop applications. You can expect questions from users, release announcements, and any other discussions that we think will be helpful to the users.