Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Static instantiation

  • Dynamic instantiation

Static Instantiation

You can instruct the CDAP runtime system to inject the dataset into a class member with the @UseDataSet annotation:

...

When starting the program, the runtime system reads the dataset specification from the metadata store and injects an instance of the dataset class into the application. This dataset will participate in every transaction that is executed by the program. If the program is multi-threaded (for example, an HTTP service handler), CDAP will make sure that every thread has its own instance of the dataset.

Dynamic Instantiation

If you don't know the name of the dataset at compile time (and hence you cannot use static instantiation), or if you want to use a dataset only for a short time, you can dynamically request an instance of the dataset through the program context:

...

Similarly to static datasets, if a program is multi-threaded, CDAP will make sure that every thread has its own instance of each dynamic dataset—and in order to discard a dataset from the cache, every thread that uses it must individually call discardDataset().

Multi-threading and Dataset Access

As mentioned above, under static and dynamic instantiation, if a program is multi-threaded, CDAP will make sure that every thread has its own instance of a dataset. This is because datasets are not thread-safe, cannot be shared across threads, and each thread must operate on its own instance of a Dataset.

...

As transactions are not thread-safe, the dataset context of a transaction as well as datasets obtained through it may not be shared across threads.

Cross-namespace Dataset Access

The dataset usage methods described above allow accessing datasets from the same namespace in which the program exists. However, dynamic dataset instantiation also allows users to access datasets from a different namespace than the one in which the program accessing the dataset is running. Typically, this may be required in scenarios where datasets are large enough to warrant sharing across namespaces, as opposed to every namespace having its own copy. To use a dataset from a different namespace, users can pass a namespace parameter to getDataset():

...