Program Lifecycle

Every program in CDAP goes through a program lifecycle, with a specific set of methods called in succession at different points of the lifecycle.

Though there are slight differences between program types, in general all programs follow the same lifecycle. The exceptions are services which have sub-programs called service handlers.

When an application is deployed, the application is configured, followed by the configuring of all programs and sub-programs of the application, recursively. The configuration happens by calling the configure() method of each entity, which creates a specification for that entity. These specifications are bundled together to create the application.

This results in application, program, and sub-program specifications. In a specification, you can set the name, description, resources, programs and sub-programs used, plugins used and required. (An example of the latter is a requirement for a JDBC driver.)

Any member variables created in the configure() methods are available only at deployment. No dataset operations are possible, as there is no access to datasets or transactions.

When an application is run, each program of an application (and any sub-programs) is executed by calling first initialize() and (eventually) destroy() for each program and sub-program. In these methods, dataset operations and transactions are allowed.

The following of the program lifecycle is a combination of CDAP's conventions and the implementation of specific interfaces. They can be summarized as:

  • At deployment time:

    • configure() By convention, all CDAP applications and programs have this method. It produces an immutable program specification.

  • At runtime:

    • initialize() A requirement of the ProgramLifecycle interface.

    • destroy() A requirement of the ProgramLifecycle interface.

The initialize() method is called once, at the start of the program. The destroy() method is called once, at the end of program before it is shutdown. If there is any cleanup required, it can be implemented in this method.

Services do not have an initialize() because they have service handlers which have an initialize() method instead.

Note that the instance of the object called at deployment is not the same instance of the object called at runtime. Because the result of the deployment stage is an immutable program specification, any local member variables set during deployment will not be available during runtime. This behavior can cause unexpected null-pointer exceptions. The solution is instead to set these as properties in the specification, which is available at runtime, as in the examples on initializing instance fields. For example, in a Spark program:

getContext().getSpecification().getProperties()

Transactions

The relationship between transactions and lifecycle depends on the method involved:

  • configure() No transactions

  • initialize() Inside a transaction

  • destroy() Inside a transaction

The exception to this are Workers, which have to execute their own, explicit transactions. See workers and datasets for details.

Details on transactions in these methods are covered in the section on using the transaction system in programs.

Program Types

For convenience, most program types have a corresponding abstract program class. It is recommended to always extend the abstract class instead of implementing the program interface. The abstract classes provide:

  • proxy methods for the program configurer's methods;

  • an initialize() method that stores the program context in a class member and makes it available via getContext(); and

  • a destroy() method that does nothing.

This table summarizes, for each program or sub-program type, the methods available, abstract class, and their signatures:

Program Type

Sub-program Type

Abstract Class

Runtime Methods

Program Type

Sub-program Type

Abstract Class

Runtime Methods

MapReduce

 

AbstractMapReduce

configure()

initialize()

destroy()

 

Mappper

Mapper

initialize()

map()

destroy()

 

Reducer

Reducer

initialize()

reduce()

destroy()

Service

 

AbstractService

configure()

destroy()



Note: no initialize()

 

ServiceHandler

AbstractHttpServiceHandler

configure()

initialize()

destroy()



@GET or

@PUT or

@POST or

@DELETE

@Path{"handlerPath"}

handlerMethod()



Note: classes extending AbstractHttpServiceHandler are only required to implement configure()

Spark

 

AbstractSpark

configure()

initialize()

destroy()

 

SparkMain

SparkMain

run()

 

JavaSparkMain

JavaSparkMain

run()

Worker

 

AbstractWorker

configure()

initialize()

destroy()

Workflow

 

AbstractWorkflow

configure()

initialize()

destroy()

 

Custom Action

AbstractCustomAction

configure()

initialize()

run()

destroy()

 

Created in 2020 by Google Inc.