Testing a CDAP Application
Strategies in Testing Applications: Test Framework
CDAP comes with a convenient way to unit-test your applications with CDAP’s Test Framework. This framework starts an in-memory CDAP runtime and lets you deploy an application; start, stop and monitor programs; access datasets to validate processing results; and retrieve metrics from the application.
The base class for such tests is TestBase
, which is packaged separately from the API in its own artifact because it depends on the CDAP’s runtime classes. You can include it in your test dependencies in one of two ways:
include all JAR files in theÂ
lib
 directory of the CDAP Sandbox installation, orinclude theÂ
cdap-unit-test
 artifact in your Maven test dependencies (as shown in theÂpom.xml
 file of the CDAP Sandbox examples):. . . <dependency> <groupId>io.cdap.cdap</groupId> <artifactId>cdap-unit-test</artifactId> <version>${cdap.version}</version> <scope>test</scope> </dependency> . . .
Note that for building an application, you only need to include the CDAP API in your dependencies. For testing, however, you need the CDAP run-time. To build your test case, extend the TestBase
 class.
Running Tests with Spark
The TestBase
 class included in the cdap-unit-test3_2.12
 dependency will run programs using Spark and Scala 2.12.
Running Tests from an IDE
When running tests from an IDE such IntelliJ or Eclipse, set the memory setting for the JUnit
 tests that are run from the IDE to an increased amount of memory. We suggest starting with:
-Xmx1024m
Strategies in Testing MapReduce Programs (Deprecated)
We can write unit testing for MapReduce programs. Let's write a test case for an application that uses MapReduce.x.
The PurchaseTest
 class should extend from TestBase
:
public class PurchaseTest extends TestBase {
@Test
public void test() throws Exception {
The PurchaseApp
 application can be deployed using the deployApplication
 method from the TestBase
 class:
The MapReduce reads from the purchases
 dataset. As a first step, the data to the purchases
 should be populated by running the PurchaseFlow
 and sending the data to the purchaseStream
 stream:
Start the MapReduce and wait for a maximum of 60 seconds:
We can start verifying that the MapReduce was run correctly by using the PurchaseHistoryService
 to retrieve a customer's purchase history:
The assertion will verify that the correct result was received.
Strategies in Testing Spark Programs
Let's write a test case for an application that uses a Spark program.
The SparkPageRankTest
 class should extend from TestBase
:
The SparkPageRankTest
 application can be deployed using the deployApplication
 method from the TestBase
 class:
The Spark program reads from the backlinkURLs
 dataset. As a first step, data in the backlinkURLs
 should be populated by running the BackLinkFlow
 and sending the data to the stream backlinkURLStream
:
Start the Spark program and wait for a maximum of 60 seconds:
We verify that the Spark program ran correctly by using the Ranks service to check the results:
The assertion will verify that the correct result was received.
Strategies in Testing Artifacts
The Test Framework provides methods to create and deploy JAR files as artifacts. This lets you test the creation of multiple applications from the same artifact, as well as the use of plugin artifacts.
To add an artifact containing an application class:
The first argument is the id
 of the artifact; the second is the application class; and the remainder of the arguments are packages that should be included in the Export-Packages
 manifest attribute bundled in the JAR. The framework will trace the dependencies of the specified application class to create a JAR with those dependencies. This will mimic what happens when you actually build your application JAR using maven.
An application can then be deployed using that artifact:
Plugins extending the artifact can also be added:
The first argument is the id
 of the plugin artifact; the second is the parent artifact it is extending; and the remainder of the arguments are classes that should be bundled in the JAR. The packages of all these classes are included in the Export-Packages
 manifest attribute bundled in the JAR. When adding a plugin artifact this way, it is important to include all classes in your plugin packages, even if they are not used in your test case. This is to ensure that the JAR can trace all required dependencies to correctly build the JAR.
The examples are taken from the DataPipelineTest
 and HydratorTestBase
 classes of CDAP pipelines.
Validating Test Data with SQL
Often the easiest way to verify that a test produced the right data is to run a SQL query, if the data sets involved in the test case are record-scannable, as described in Data Exploration. This can be done using a JDBC connection obtained from the test base:
The JDBC connection does not implement the full JDBC functionality: it does not allow variable replacement and will not allow you to make any changes to datasets. But it is sufficient to perform test validation: you can create or prepare statements and execute queries, then iterate over the results set and validate its correctness.
Configuring CDAP Runtime for Test Framework
The TestBase
 class inherited by your test class starts an in-memory CDAP runtime before executing any test methods. Sometimes you may need to configure the CDAP runtime to suit your specific requirements. For example, if your test does not involve usage of SQL queries, you can turn off the explore service to reduce startup and shutdown times.
You alter the configurations for the CDAP runtime by applying a JUnit @ClassRule
 on a TestConfiguration
 instance. For example:
Refer to the cdap-site.xml for the available set of configurations used by CDAP.
Â
Created in 2020 by Google Inc.