Applications

An Application is a collection of building blocks that read and write data through the data abstraction layer in CDAP.

Applications are composed from Programs, Services, and Schedules.

Programs include MapReduce programs, Workflows, Spark programs, and Workers, which are used to process data. Services are used to serve data.

Data abstractions include Datasets.

Applications are created using an Artifact and optional configuration. An Artifact is a JAR file that packages the Java Application class that defines how the Programs, Services, Schedules, and Datasets interact.

It also packages any dependent classes and libraries needed to run the Application.

Implementing an Application Class

To implement an application class, extend the AbstractApplication class, specifying the application metadata and declaring and configuring each of the application components:

public class MyApp extends AbstractApplication {
  @Override
  public void configure() {
    setName("myApp");
    setDescription("My Sample Application");
    createDataset("myAppDataset", Table.class);
    addService(new MyService());
    addMapReduce(new MyMapReduce());
    addWorkflow(new MyAppWorkflow());
  }
}

Components are defined using user-written classes that implement correspondent interfaces and are referenced by passing an object, in addition to being assigned a unique name.

Names used for datasets need to be unique across the CDAP namespace, while names used for programs and services need to be unique only to the application.

A Typical CDAP Application Class

A typical design of a CDAP application class consists of:

MapReduce programs, Spark programs, and Workflows for batch processing tasks;
Workers for processing data in an ad-hoc manner that doesn't fit into real-time or batch paradigms
Datasets for storage of data, either raw or the processed results; and
Services for serving data and processed results.

Of course, not all components are required. It depends on the application. A minimal application could include a workflow and a dataset. In the next sections, we'll look at these components, and their interactions.

Application Version

Applications can be created with a version string. This can be useful when a newer version of the same application needs to be created, and you need to distinguish them and run them at the same time. Programs of a specific version of an application can be started and stopped using the calls of the version-aware Lifecycle Microservices.

If a version is not provided while creating an application (i.e., the application is created using a non-version-aware API), a default version of "-SNAPSHOT" is used.

If an application version is specified that matches one that already exists, it will be overwritten only if the version string ends with "-SNAPSHOT". Otherwise, versions are immutable, and the only way to change a version is to delete the application of that version and then redeploy it.

Information about the version-aware CDAP Microservices to create, list, and delete applications using versions can be found in the Lifecycle Microservices documentation.

Application Configuration

Application classes can use a Config class to receive a configuration when an Application is created. For example, configuration can be used to specify, at application creation time, a dataset to be read, rather than having them hard-coded in the AbstractApplication's configure method. The configuration class needs to be the type parameter of the AbstractApplication class. It should also extend the Config class present in the CDAP API. The configuration is provided as part of the request body to create an application. It is available during configuration time through the getConfig() method in AbstractApplication.

Information about the HTTP call is available in the Lifecycle Microservices documentation.

We can modify the MyApp class above to take in a Configuration MyApp.MyAppConfig:

public class MyApp extends AbstractApplication<MyApp.MyAppConfig> {

  public static class MyAppConfig extends Config {
    String datasetName;

    public MyAppConfig() {
      // Default values
      this.datasetName = "myAppDataset";
    }
  }

  @Override
  public void configure() {
    MyAppConfig config = getConfig();
    setName("myApp");
    setDescription("My Sample Application");
    createDataset(config.datasetName, Table.class);
    addService(new MyService(config.datasetName));
    addMapReduce(new MyMapReduce(config.datasetName));
    addWorkflow(new MyAppWorkflow());
  }
}

In order to use the configuration in programs, we pass it to individual programs using their constructor. If the configuration parameter is also required during runtime, you can use the @Property annotation. In the example below, the uniqueCountTableName is used in the configure method to register the usage of the dataset. It is also used during the runtime to get the dataset instance using getDataset() method:

public class UniqueCounter extends AbstractFlowlet {
  @Property
  private final String uniqueCountTableName;

  private UniqueCountTable uniqueCountTable;

  @Override
  public void configure(FlowletConfigurer configurer) {
    super.configure(configurer);
  }

  public UniqueCounter(String uniqueCountTableName) {
    this.uniqueCountTableName = uniqueCountTableName;
  }

  @Override
  public void initialize(FlowletContext context) throws Exception {
    super.initialize(context);
    uniqueCountTable = context.getDataset(uniqueCountTableName);
  }

  @ProcessInput
  public void process(String word) {
    this.uniqueCountTable.updateUniqueCount(word);
  }
}