...
Workers have semantics similar to a Java thread and are run in a thread when CDAP is run in either in-memory or local sandbox modes. In distributed mode, each instance of a worker runs in its own YARN container. Their instances may be updated via the Command Line Interface or a RESTful APIMicroservices:
Code Block |
---|
public class ProcessWorker extends AbstractWorker {
@Override
public void initialize(WorkerContext context) {
super.initialize(context);
...
}
@Override
public void run() {
...
}
@Override
public void stop() {
...
}
@Override
public void destroy() {
...
}
}
|
Workers and Resources
The size of the YARN container and the number of virtual cores allocated to a worker can be specified using a setResources
method in the worker initialize
method:
Code Block |
---|
@Override
public void initialize(WorkerContext context) {
super.initialize(context);
...
setResources(new Resources(1024, 2));
...
}
|
In this example, each worker container will be initialized with 1024 MB of memory and 2 virtual cores.
...
Code Block |
---|
@Override
public void run() {
getContext().execute(new TxRunnable() {
@Override
public void run(DatasetContext context) {
Dataset dataset = context.getDataset("tableName");
...
}
});
}
|
Operations executed on datasets within a run
are committed as part of a single transaction. The transaction is started before run
is invoked and is committed upon successful execution. Exceptions thrown while committing the transaction or thrown by user-code result in a rollback of the transaction. It is recommended that TransactionConflictException
be caught and handled appropriately; for example, you can retry a dataset
operation.
...