Workers
Workers can be used to run background processes. Workers provide the ability to write data processing logic that doesn't fit into the other paradigms such as MapReduce or real-time and batch processing.
You can add workers to your application by calling the addWorker
 method in the application's configure
 method:
public class AnalyticsApp extends AbstractApplication {
@Override
public void configure() {
setName("AnalyticsApp");
...
addWorker(new ProcessWorker());
...
}
}
Workers have semantics similar to a Java thread and are run in a thread when CDAP is run in either in-memory or local sandbox modes. In distributed mode, each instance of a worker runs in its own YARN container. Their instances may be updated via the Command Line Interface or a Microservices:
public class ProcessWorker extends AbstractWorker {
@Override
public void initialize(WorkerContext context) {
super.initialize(context);
...
}
@Override
public void run() {
...
}
@Override
public void stop() {
...
}
@Override
public void destroy() {
...
}
}
Workers and Resources
The size of the YARN container and the number of virtual cores allocated to a worker can be specified using a setResources
 method in the worker initialize
 method:
@Override
public void initialize(WorkerContext context) {
super.initialize(context);
...
setResources(new Resources(1024, 2));
...
}
In this example, each worker container will be initialized with 1024 MB of memory and 2 virtual cores.
Workers and Datasets
Workers can access and use datasets via a DatasetContext
 inside the run
 method of a TxRunnable
. AÂ TxRunnable
 can be executed using the execute
 method of a WorkerContext
. Note that WorkerContext
 is thread-safe and that each thread will receive its own instance of the Dataset
 being accessed. Though it is necessary to use the getDataset
 method of DatasetContext
 to access a Dataset
 instance inside a TxRunnable
, this operation is performance-optimized with the help of internal caching logic. In this example, the dataset tableName is accessed from within a TxRunnable
:
Operations executed on datasets within a run
 are committed as part of a single transaction. The transaction is started before run
 is invoked and is committed upon successful execution. Exceptions thrown while committing the transaction or thrown by user-code result in a rollback of the transaction. It is recommended that TransactionConflictException
 be caught and handled appropriately; for example, you can retry a dataset
 operation.
Services can be discovered from inside of workers. Services can either belong to the same application or to another application in the same namespace. WorkerContext
 can be used to discover the URL of the service of interest:
Â
Created in 2020 by Google Inc.