Debugging an Application in CDAP Sandbox

Any CDAP application can be debugged in the CDAP Sandbox by attaching a remote debugger to the CDAP JVM. To enable remote debugging:

  1. Start the CDAP Sandbox with --enable-debug, optionally specifying a port (default is 5005).

    The CDAP should confirm that the debugger port is open with a message such as Remote debugger agent started on port 5005.

  2. Deploy (for example) the SportResults application to the CDAP by dragging and dropping the SportResults.jar file from the /examples/SportResults directory onto the CDAP UI.

  3. Open the SportResults application in an IDE and connect to the remote debugger.

For more information, see Attaching a Debugger.

note

Note: Currently, debugging is not supported under Windows.

Note: Currently, debugging is not supported under Windows.

Debugging an Application in Distributed CDAP

In distributed mode, an application does not run in a single JVM. Instead, its programs are dispersed over multiple, if not many, containers in the Hadoop or Spark clusters. There is no single place to debug the entire application.

You can, however, debug every individual container by attaching a remote debugger to it. In order to debug a container, you need to start the component with debugging enabled by making an HTTP request to the component’s URL. For example, the following will start a program for debugging:

POST /v3/namespaces/default/apps/WordCount/spark/WordCounter/debug

Using curl:

$ curl -w"\n" -X POST "http://<hostname>:11015/v3/namespaces/default/apps/WordCount/spark/WordCounter/debug"
> curl -X POST "http://<hostname>:11015/v3/namespaces/default/apps/WordCount/spark/WordCounter/debug"

Note that this URL differs from the URL for starting the program only by the last path component (debug instead of start; see the Lifecycle Microservices). You can pass in runtime arguments in the exact same way as you normally would start a program.

Once the program is running, you can find out the address of the containers' information, such as the host, port, and memory, by querying the HTTP endpoint:

GET /v3/namespaces/default/apps/WordCount/spark/WordCounter/live-info

or, using the CDAP CLI:

$ cdap cli get spark live WordCount.WordCounter
> cdap cli get spark live WordCount.WordCounter

The response is formatted in JSON and, pretty-printed, would look similar to this:

{
  "app": "WordCount",
  "containers": [
    {
      "container": "container_1397069870124_0010_01_000002",
      "debugPort": 42071,
      "host": "node-1004.my.cluster.net",
      "instance": 0,
      "memory": 512,
      "name": "unique",
      "virtualCores": 1
    },
    ...
    {
      "container": "container_1397069870124_0010_01_000005",
      "debugPort": 37205,
      "host": "node-1003.my.cluster.net",
      "instance": 0,
      "memory": 512,
      "name": "splitter",
      "virtualCores": 1
    }
  ],
  "id": "WordCounter",
  "runtime": "distributed",
  "type": "spark",
  "yarnAppId": "application_1397069870124_0010"
}

You see the YARN application id and the YARN container IDs, but more importantly, you can see the host name and debugging port. For example, for one of the containers, you see the host is node-1003.my.cluster.net and the debugging port is 37205. You can now attach your debugger to the container’s JVM

The corresponding HTTP requests for the RetrieveCounts service of this application would be:

POST /v3/namespaces/default/apps/WordCount/services/RetrieveCounts/debug
GET /v3/namespaces/default/apps/WordCount/services/RetrieveCounts/live-info

Analysis of the response would give you the host names and debugging ports for all instances of the service.

Attaching a Debugger

Debugging with IntelliJ

Note: These instructions were developed with IntelliJ v13.1.2. You may need to adjust them for your installation or version.

  1. From the IntelliJ toolbar, select Run > Edit Configurations.

  2. Click + and choose Remote:

  3. Create a debug configuration by entering a name, for example, CDAP.

  4. In the Host field, type the host name, for example, localhost or node-1003.my.cluster.net.

  5. In the Port field, type the debugging port, for example, 5005:

  6. To start the debugger, click Run > Debug > CDAP.

  7. Set a breakpoint in any code block.

  8. Start the program in the CDAP UI.

  9. Perform an operation that will execute the method. The control will stop at the breakpoint and you can proceed with debugging.

Debugging with Eclipse

Note: These instructions were developed with Eclipse IDE for Java Developers v4.4.0. You may need to adjust them for your installation or version.

  1. In Eclipse, click Run > Debug configurations.

  2. To create a new launch configuration, in the list on the left of the window, double-click Remote Java Application.

  3. Enter a name and project, for example, CDAP.

  4. In the Host field, type the host name, for example, localhost or node-1003.my.cluster.net.

  5. In the Port field, type the debugging port, for example, 5005.

  6. To start the debugger, in your project, click Debug.

  7. Set a breakpoint in any code block.

  8. Start the program in the CDAP UI.

  9. Perform an operation that will execute the method. The control will stop at the breakpoint and you can proceed with debugging.

Debugging the Transaction Manager (Advanced Use)

In this advanced use section, we will explain in depth how transactions work internally. Transactions are introduced in the Transaction System.

A transaction is defined by an identifier, which contains the timestamp, in milliseconds, of its creation. This identifier, also called the write pointer, represents the version that this transaction will use for all of its writes. It is also used to determine the order between transactions. A transaction with a smaller write pointer than another transaction must have been started earlier.

The Transaction Manager (or TM) uses the write pointers to implement Optimistic Concurrency Control by maintaining state for all transactions that could be facing concurrency issues.

Transaction Manager States

The state of the TM is defined by these structures and rules:

Transaction Lifecycle States

Here are the states a transaction goes through in its lifecycle:

The committed change sets structure determines how fast conflict detections are performed. Fortunately, not all the committed writes need to be remembered; only those which may create a conflict with in-progress transactions. This is why only the writes committed after the start of the oldest, in-progress, not-long-running transaction are stored in this structure, and why transactions which participate in conflict detection must remain short in duration. The older they are, the bigger the committed change sets structure will be and the longer conflict detection will take.

When conflict detection takes longer, so does committing a transaction and the transaction stays longer in the in-progress set. The whole transaction system can become slow if such a situation occurs.

Dumping the Transaction Manager

CDAP comes bundled with a script that allows you to dump the state of the internal transaction manager into a local file to allow further investigation. If your CDAP Instance tends to become slow, you can use this tool to detect the incriminating transactions. This script is called cdap debug transactions (on Windows, it is tx-debugger.bat).

To download a snapshot of the state of the TM of the CDAP, use the command:

$ cdap debug transactions view --host <name> [--save <filename>]
> tx-debugger.bat view --host <name> [--save <filename>]

where name is the host name of your CDAP instance, and the optional filename specifies where the snapshot should be saved. This command will print statistics about all the structures that define the state of the TM.

You can also load a snapshot that has already been saved locally with the command:

$ cdap debug transactions view --filename <filename>
> tx-debugger.bat view --filename <filename>

where filename specifies the location where the snapshot has been saved.

Here are options that you can use with the tx-debugger view commands:

While transactions don't inform you about the tasks that launched them, whether it was a Spark or MapReduce program, you can match the time they were started with the activity of your CDAP to track potential issues.

If you really know what you are doing and you spot a transaction in the in-progress set that should be in the excluded set, you can use this command to invalidate it:

$ cdap debug transactions invalidate --host <name> --transaction <writePtr>
> tx-debugger.bat invalidate --host <name> --transaction <writePtr>

Invalidating a transaction when we know for sure that its writes should be invalidated is useful, because those writes will then be removed from the concerned Tables.