Add a way to retrieve the properties with which a dataset was created
Description
In CDAP-3051, we want to add an API to allow updating the properties of a dataset. That goes hand in hand with an API to retrieve the current properties. For example, if an app wants to add an index column to an index table, it first needs to know the existing set of index columns.
However, it is not trivial to retrieve the current properties of a dataset. The dataset service does not store the properties that were used to configure the dataset in its metadata. What is actually does is call the dataset definition's configure() method, and then stores the dataset spec returned from that. That spec has a properties field, but that does not necessarily reflect the properties that were passed in.
In order to reconfigure or to clone a dataset, the client needs to be able to retrieve the original properties with which the dataset was created. This Jira adds an API to do so.
Release Notes
Adds an API to retrieve the properties that were used to configure (or reconfigure) a dataset.
Missed one case: when retrieving the list of datasets in a namespace (GET /v3/namespace/default/data/datasets), we also need to call fixProperties() - currently it simply uses spec.getProperties for the result.
Because the existing dataset framework does not store the original dataset properties, this consists of two parts:
store the original dataset properties as part of the spec
for existing datasets (after an upgrade), implement a method to derive the original properties from the dataset spec
The second part is possible for all built-in datasets, even though some of them manipulate the properties before creating the spec. For user-defined datasets, we can only make a best effort, because we do not know the code that configured them.
Here are the built-in datasets that do not preserve the original properties 1:1:
FileSet: adds a FILESET_VERSION property
TimePartitionedFileSet: adds the PARTITIONING property
ObjectMappedTable: adds TABLE_SCHEMA and TABLE_SCHEMA_ROW_FIELD
LineageDataset: adds CONFLICT_LEVEL=NONE
UsageDataset: adds CONFLICT_LEVEL=NONE
Only the first three are public datasets used by developers, the last two are only used by the system.
In CDAP-3051, we want to add an API to allow updating the properties of a dataset. That goes hand in hand with an API to retrieve the current properties. For example, if an app wants to add an index column to an index table, it first needs to know the existing set of index columns.
However, it is not trivial to retrieve the current properties of a dataset. The dataset service does not store the properties that were used to configure the dataset in its metadata. What is actually does is call the dataset definition's configure() method, and then stores the dataset spec returned from that. That spec has a properties field, but that does not necessarily reflect the properties that were passed in.
In order to reconfigure or to clone a dataset, the client needs to be able to retrieve the original properties with which the dataset was created. This Jira adds an API to do so.