Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Workers have semantics similar to a Java thread and are run in a thread when CDAP is run in either in-memory or local sandbox modes. In distributed mode, each instance of a worker runs in its own YARN container. Their instances may be updated via the Command the Command Line Interface or Interface or RESTful APIMicroservices:

Code Block
public class ProcessWorker extends AbstractWorker {

  @Override
  public void initialize(WorkerContext context) {
    super.initialize(context);
    ...
  }

  @Override
  public void run() {
    ...
  }

  @Override
  public void stop() {
    ...
  }

  @Override
  public void destroy() {
    ...
  }
}

Workers and Resources

The size of the YARN container and the number of virtual cores allocated to a worker can be specified using a setResources method in the worker initialize method:

Code Block
@Override
public void initialize(WorkerContext context) {
  super.initialize(context);
  ...
  setResources(new Resources(1024, 2));
  ...
}

In this example, each worker container will be initialized with 1024 MB of memory and 2 virtual cores.

...

Code Block
@Override
public void run() {
  getContext().execute(new TxRunnable() {
    @Override
    public void run(DatasetContext context) {
      Dataset dataset = context.getDataset("tableName");
      ...
    }
  });
}

Operations executed on datasets within a run are committed as part of a single transaction. The transaction is started before run is invoked and is committed upon successful execution. Exceptions thrown while committing the transaction or thrown by user-code result in a rollback of the transaction. It is recommended that TransactionConflictException be caught and handled appropriately; for example, you can retry a dataset operation.

...