Should we enable dataset upgrade by default, or document how to enable it?

Description

allows to bypass a limitation in the dataset framework: Once a dataset type/module is deployed, it cannot be updated. As a special case of that, if an application contains a custom dataset type, the code for that type is extracted and deployed as a dataset module when the application is deployed for the first time. But when the application is redeployed, the application is code is updated, but the dataset code is not. That can have irritating consequences.

allows to force the upgrade of the dataset code in this scenario, by setting a property in cdap-site.xml. This feature was added in 2.6.1 but not documented. applies the same change to 2.8.

The question is, should this be documented or should it even be enabled by default? Intuitively, anybody who redeploys an application would expect that all code gets updated. Since this only applies to datasets that were defined as part of an application, these datasets are most likely only used by that application (and possibly by explore). Which would mean that no damage is done to other apps by updating the code.

We need to decide this soon, before we freeze 2.8.0

Release Notes

None

Activity

Show:

Andreas Neumann April 30, 2015 at 6:43 PM

Decided not to enable it by default. is for completing this work post-3.0

Priyanka Nambiar March 26, 2015 at 2:46 AM

Deprioritizing for 3.0

Andreas Neumann March 17, 2015 at 1:15 AM

Deferring to 3.0

Andreas Neumann March 17, 2015 at 12:22 AM

After an attempt to add a sanity check for whether the new dataset is structurally compatible with the existing one (same type, embedded datasets have same type, but properties may be different), it turns out this change is not doable in the scope of this release.

Without this check, too many things can go wrong in an unchecked upgrade, most of all

  • loss of all data in the dataset

  • orphaned tables in HBase that have data but are not referenced by a dataset any more

Therefore we revert the decision and make this disable by default.

Andreas Neumann March 16, 2015 at 7:38 PM

We had some discussions and decided to make it the default. Reasons:

  • it is unacceptable that datasets cannot be updated

  • it is counter-intuitive that when an application is updated, its datasets remain the same old

  • forcing the update of the datasets only has implications for other applications if they do not bundle the dataset code themselves. This is uncommon (the app needs the dataset to build. It is possible but unlikely that the dataset is in a separate artifact with provided scope).

  • any application that bundles the dataset code will use its version of that code

The possibly troublesome scenario is if two applications share the dataset, and one app has the older code but is updated later. In that case, the deployed version of the dataset would be the older code, and Explore might not function properly with that. The workaround would be to simply redeploy the app that has the newer version of the dataset code.

Hence: The decision is to enable this by default. We will not add documentation for this because it is unlikely to affect existing customers.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Components

Fix versions

Due date

Priority

Created March 5, 2015 at 7:02 PM
Updated April 30, 2015 at 6:43 PM
Resolved April 30, 2015 at 6:43 PM