Workers
Workers can be used to run background processes. Workers provide the ability to write data processing logic that doesn't fit into the other paradigms such as MapReduce or real-time and batch processing.
You can add workers to your application by calling the addWorker
method in the application's configure
method:
public class AnalyticsApp extends AbstractApplication {
@Override
public void configure() {
setName("AnalyticsApp");
...
addWorker(new ProcessWorker());
...
}
}
Workers have semantics similar to a Java thread and are run in a thread when CDAP is run in either in-memory or local sandbox modes. In distributed mode, each instance of a worker runs in its own YARN container. Their instances may be updated via the Command Line Interface or a Microservices:
public class ProcessWorker extends AbstractWorker {
@Override
public void initialize(WorkerContext context) {
super.initialize(context);
...
}
@Override
public void run() {
...
}
@Override
public void stop() {
...
}
@Override
public void destroy() {
...
}
}
Workers and Resources
The size of the YARN container and the number of virtual cores allocated to a worker can be specified using a setResources
method in the worker initialize
method:
@Override
public void initialize(WorkerContext context) {
super.initialize(context);
...
setResources(new Resources(1024, 2));
...
}
In this example, each worker container will be initialized with 1024 MB of memory and 2 virtual cores.
Workers and Datasets
Workers can access and use datasets via a DatasetContext
inside the run
method of a TxRunnable
. A TxRunnable
can be executed using the execute
method of a WorkerContext
. Note that WorkerContext
is thread-safe and that each thread will receive its own instance of the Dataset
being accessed. Though it is necessary to use the getDataset
method of DatasetContext
to access a Dataset
instance inside a TxRunnable
, this operation is performance-optimized with the help of internal caching logic. In this example, the dataset tableName is accessed from within a TxRunnable
:
Operations executed on datasets within a run
are committed as part of a single transaction. The transaction is started before run
is invoked and is committed upon successful execution. Exceptions thrown while committing the transaction or thrown by user-code result in a rollback of the transaction. It is recommended that TransactionConflictException
be caught and handled appropriately; for example, you can retry a dataset
operation.
Services can be discovered from inside of workers. Services can either belong to the same application or to another application in the same namespace. WorkerContext
can be used to discover the URL of the service of interest:
Created in 2020 by Google Inc.