...
Static instantiation
Dynamic instantiation
Static Instantiation
You can instruct the CDAP runtime system to inject the dataset into a class member with the @UseDataSet
annotation:
...
When starting the program, the runtime system reads the dataset specification from the metadata store and injects an instance of the dataset class into the application. This dataset will participate in every transaction that is executed by the program. If the program is multi-threaded (for example, an HTTP service handler), CDAP will make sure that every thread has its own instance of the dataset.
Dynamic Instantiation
If you don't know the name of the dataset at compile time (and hence you cannot use static instantiation), or if you want to use a dataset only for a short time, you can dynamically request an instance of the dataset through the program context:
...
Similarly to static datasets, if a program is multi-threaded, CDAP will make sure that every thread has its own instance of each dynamic dataset—and in order to discard a dataset from the cache, every thread that uses it must individually call discardDataset()
.
Multi-threading and Dataset Access
As mentioned above, under static and dynamic instantiation, if a program is multi-threaded, CDAP will make sure that every thread has its own instance of a dataset. This is because datasets are not thread-safe, cannot be shared across threads, and each thread must operate on its own instance of a Dataset.
...
As transactions are not thread-safe, the dataset context of a transaction as well as datasets obtained through it may not be shared across threads.
Cross-namespace Dataset Access
The dataset usage methods described above allow accessing datasets from the same namespace in which the program exists. However, dynamic dataset instantiation also allows users to access datasets from a different namespace than the one in which the program accessing the dataset is running. Typically, this may be required in scenarios where datasets are large enough to warrant sharing across namespaces, as opposed to every namespace having its own copy. To use a dataset from a different namespace, users can pass a namespace
parameter to getDataset()
:
...