Testing a CDAP Application

Strategies in Testing Applications: Test Framework

CDAP comes with a convenient way to unit-test your applications with CDAP’s Test Framework. This framework starts an in-memory CDAP runtime and lets you deploy an application; start, stop and monitor programs; access datasets to validate processing results; and retrieve metrics from the application.

The base class for such tests is TestBase, which is packaged separately from the API in its own artifact because it depends on the CDAP’s runtime classes. You can include it in your test dependencies in one of two ways:

include all JAR files in the lib directory of the CDAP Sandbox installation, or
include the cdap-unit-test artifact in your Maven test dependencies (as shown in the pom.xml file of the CDAP Sandbox examples):
. . . <dependency> <groupId>io.cdap.cdap</groupId> <artifactId>cdap-unit-test</artifactId> <version>${cdap.version}</version> <scope>test</scope> </dependency> . . .

Note that for building an application, you only need to include the CDAP API in your dependencies. For testing, however, you need the CDAP run-time. To build your test case, extend the TestBase class.

Running Tests with Spark

The TestBase class included in the cdap-unit-test3_2.12 dependency will run programs using Spark and Scala 2.12.

Running Tests from an IDE

When running tests from an IDE such IntelliJ or Eclipse, set the memory setting for the JUnit tests that are run from the IDE to an increased amount of memory. We suggest starting with:

-Xmx1024m

Strategies in Testing MapReduce Programs (Deprecated)

We can write unit testing for MapReduce programs. Let's write a test case for an application that uses MapReduce.x.

The PurchaseTest class should extend from TestBase:

public class PurchaseTest extends TestBase {
  @Test
  public void test() throws Exception {

The PurchaseApp application can be deployed using the deployApplication method from the TestBase class:

The MapReduce reads from the purchases dataset. As a first step, the data to the purchases should be populated by running the PurchaseFlow and sending the data to the purchaseStream stream:

Start the MapReduce and wait for a maximum of 60 seconds:

We can start verifying that the MapReduce was run correctly by using the PurchaseHistoryService to retrieve a customer's purchase history:

The assertion will verify that the correct result was received.

Strategies in Testing Spark Programs

Let's write a test case for an application that uses a Spark program.

The SparkPageRankTest class should extend from TestBase:

The SparkPageRankTest application can be deployed using the deployApplication method from the TestBase class:

The Spark program reads from the backlinkURLs dataset. As a first step, data in the backlinkURLs should be populated by running the BackLinkFlow and sending the data to the stream backlinkURLStream:

Start the Spark program and wait for a maximum of 60 seconds:

We verify that the Spark program ran correctly by using the Ranks service to check the results:

The assertion will verify that the correct result was received.

Strategies in Testing Artifacts

The Test Framework provides methods to create and deploy JAR files as artifacts. This lets you test the creation of multiple applications from the same artifact, as well as the use of plugin artifacts.

To add an artifact containing an application class:

The first argument is the id of the artifact; the second is the application class; and the remainder of the arguments are packages that should be included in the Export-Packages manifest attribute bundled in the JAR. The framework will trace the dependencies of the specified application class to create a JAR with those dependencies. This will mimic what happens when you actually build your application JAR using maven.

An application can then be deployed using that artifact:

Plugins extending the artifact can also be added:

The first argument is the id of the plugin artifact; the second is the parent artifact it is extending; and the remainder of the arguments are classes that should be bundled in the JAR. The packages of all these classes are included in the Export-Packages manifest attribute bundled in the JAR. When adding a plugin artifact this way, it is important to include all classes in your plugin packages, even if they are not used in your test case. This is to ensure that the JAR can trace all required dependencies to correctly build the JAR.

The examples are taken from the DataPipelineTest and HydratorTestBase classes of CDAP pipelines.

Validating Test Data with SQL

Often the easiest way to verify that a test produced the right data is to run a SQL query, if the data sets involved in the test case are record-scannable, as described in Data Exploration. This can be done using a JDBC connection obtained from the test base:

The JDBC connection does not implement the full JDBC functionality: it does not allow variable replacement and will not allow you to make any changes to datasets. But it is sufficient to perform test validation: you can create or prepare statements and execute queries, then iterate over the results set and validate its correctness.

Configuring CDAP Runtime for Test Framework

The TestBase class inherited by your test class starts an in-memory CDAP runtime before executing any test methods. Sometimes you may need to configure the CDAP runtime to suit your specific requirements. For example, if your test does not involve usage of SQL queries, you can turn off the explore service to reduce startup and shutdown times.

You alter the configurations for the CDAP runtime by applying a JUnit @ClassRule on a TestConfiguration instance. For example:

Refer to the cdap-site.xml for the available set of configurations used by CDAP.