Every program in CDAP goes through a program lifecycle, with a specific set of methods called in succession at different points of the lifecycle.
Though there are slight differences between program types, in general all programs follow the same lifecycle. The exceptions are services which have sub-programs called service handlers.
When an application is deployed, the application is configured, followed by the configuring of all programs and sub-programs of the application, recursively. The configuration happens by calling the configure() method of each entity, which creates a specification for that entity. These specifications are bundled together to create the application.
This results in application, program, and sub-program specifications. In a specification, you can set the name, description, resources, programs and sub-programs used, plugins used and required. (An example of the latter is a requirement for a JDBC driver.)
Any member variables created in the configure() methods are available only at deployment. No dataset operations are possible, as there is no access to datasets or transactions.
When an application is run, each program of an application (and any sub-programs) is executed by calling first initialize() and (eventually) destroy() for each program and sub-program. In these methods, dataset operations and transactions are allowed.
The following of the program lifecycle is a combination of CDAP's conventions and the implementation of specific interfaces. They can be summarized as:
At deployment time:
configure() By convention, all CDAP applications and programs have this method. It produces an immutable program specification.
initialize() A requirement of the ProgramLifecycle interface.
destroy() A requirement of the ProgramLifecycle interface.
The initialize() method is called once, at the start of the program. The destroy() method is called once, at the end of program before it is shutdown. If there is any cleanup required, it can be implemented in this method.
Services do not have an initialize() because they have service handlers which have an initialize() method instead.
Note that the instance of the object called at deployment is not the same instance of the object called at runtime. Because the result of the deployment stage is an immutable program specification, any local member variables set during deployment will not be available during runtime. This behavior can cause unexpected null-pointer exceptions. The solution is instead to set these as properties in the specification, which is available at runtime, as in the examples on initializing instance fields. For example, in a Spark program:
The relationship between transactions and lifecycle depends on the method involved:
configure() No transactions
initialize() Inside a transaction
destroy() Inside a transaction
The exception to this are Workers, which have to execute their own, explicit transactions. See workers and datasets for details.
For convenience, most program types have a corresponding abstract program class. It is recommended to always extend the abstract class instead of implementing the program interface. The abstract classes provide:
proxy methods for the program configurer's methods;
an initialize() method that stores the program context in a class member and makes it available via getContext(); and
a destroy() method that does nothing.
This table summarizes, for each program or sub-program type, the methods available, abstract class, and their signatures:
Note: classes extendingAbstractHttpServiceHandlerare only required to implementconfigure()