Checklist
We want to remove the usage of upgrade tool, so that we can move towards the goal of zero/minimal down time.
For this specific work, the goal is to remove the upgrade of metadata states in the Upgrade Tool and rather move it to background threads started in the individual stores - DatasetBasedTimeSchedule, DatasetBasedStreamSizeSchedule, AppMetadataStore.
Currently the Upgrade Tool performs two high level operations -
a) upgrade the coprocessors of CDAP Datasets
b) modify stream store (this will be removed since this step was present even in 3.5)
c) add app versions to three datasets - DatasetBasedTimeScheduleStore, DatasetBasedStreamSizeScheduleStore, AppMetadataDataset
Step a) is performed linearly and thus this will contribute to the upgrade tool run time proportional to the number of datasets in CDAP.
Step c) needs to be moved to their respective data stores and the upgrade tool should not be doing that operation anymore.
Parallelizing Coprocessor Upgrade:
Currently this step involves calling disableTable (sync call), changing table descriptor and enabling the table, all the tables one-by-one. This is expected to take a long time especially when there are lot of CDAP managed HBase tables. This time will add up and might exceed the upgrade time period. In order to optimize this better, we can use a Thread Pool and submit 'disable->change table descriptor->enable' jobs for each table to that executor pool to achieve parallelism for these coprocessor upgrade operation. This can minimize the amount of time taken for coprocessors upgrade step in the Upgrade Tool. The number of threads in the ThreadPool can be made configurable as well, which can be tuned as per the requirement.
Adding App Version to System Datasets using Background Upgrade Threads:
For each of the Datasets where App version needs to be added while the Stores still continue to read old data formats.
Step 1) Since we can't upgrade the datasets in the upgrade tool, we need to do it after CDAP starts up. That means the dataset store should be able to work with both the old format and the new versioned-format.
Step 2) The store will check if the app version needs to be upgraded (based on a key in the table which indicates what was the last 'CDAP' version of the dataset). If it is not the latest, then the background thread is started which will update the entries in the background.
Step 3) During normal dataset operations (for example, pause schedule or delete schedule or add schedule etc), the following things must be kept in mind:
Data Format:
Background Threads:
No Programmatic API changes.
None
URL | Description | Response |
---|---|---|
/v3/system/upgrade/status | 3.5 Installation with time and stream schedules and existing applications, run records, workflow tokens, workflow node state. Upgrade to 4.1 and verify the normal functionality of CDAP |
{"from" : 3.5.1, "to" : 4.1.1, "inprogress" : [ "DatasetTimeSchedule", "DatasetStreamSizeSchedule" ], "completed" : [ "AppMetaStore" ]}
|
None
None, since the upgrade operations will happen in AppFabric in background threads and that process already has the privileges to modify these datasets.
Background upgrade threads will set upgraded CDAP version only after all the upgrade is complete. Until then upgrade thread will be started by the respective stores. And the upgrade threads will retry the operations in case of errors while trying to write to HBase with a specific retry strategy.
Test ID | Test Description | Expected Results |
---|---|---|
1 | 3.5 Installation with time and stream schedules and existing applications, run records, workflow tokens, workflow node state. Upgrade to 4.1 and verify the normal functionality of CDAP | 4.1 should work fine with full functionality |
2 | Same test as above, scan the three stores after some time to make sure the data in those datasets have been upgraded | All the dataset entries should have app versions |
3 | 4.0.1 Installation with all the setup as step 1) | 4.1 should work fine with full functionality |