RS-001 Coprocessor Rolling Upgrade

RS-001 Coprocessor Rolling Upgrade

Checklist

User Stories Documented
User Stories Reviewed
Design Reviewed
APIs reviewed
Release priorities assigned
Test cases reviewed
Blog post

Introduction 

One of the reasons CDAP must be stopped before an upgrade is so that the upgrade tool can be run to update the coprocessors for all CDAP tables. In order to minimize downtime, we would like to be able to upgrade coprocessors in a rolling fashion.

Goals

Design a method to upgrade CDAP HBase coprocessors in a rolling fashion, with minimal downtime.

User Stories 

  • As a cluster administrator, I want to be able to upgrade CDAP coprocessors without stopping CDAP

  • As a cluster administrator, I want to be able to upgrade HBase without stopping CDAP

Design

Prior to 4.1.0, the way coprocessors are handled is that they are built and loaded onto hdfs when the dataset is created. When the HBase Table is created, it is configured with the hdfs path of the coprocessor(s), the classname of the coprocessor(s), and the priority. During a CDAP upgrade, CDAP is stopped, and an upgrade tool is run that loops through all tables, disables the table, builds and uploads the new coprocessor jars, modifies the table to point to the new coprocessor(s) on hdfs, then re-enables the table. This is nice in that CDAP manages coprocessors itself and cluster administrators don't need to know anything about coprocessors. It is not ideal in that it requires downtime in order to upgrade the coprocessor. 

Approach

We first describe the approach for CDAP rolling upgrade, assuming that no HBase upgrade is happening.

Rolling CDAP upgrade

We will change the coprocessors used by Tables to be wrappers that lookup the cdap version, download the relevant coprocessor jar from hdfs, instantiate the relevant class, then delegate all calls to the instantiated class. That give more detail, on startup, CDAP will load all required coprocessors to predetermined locations on hdfs:

/cdap/lib/coprocessors/table-<cdap-version>-<hbase-version>.jar

for example, the actual coprocessor implementation will be placed on hdfs at:

/cdap/lib/coprocessors/table-4.1.0-1.1.0.jar

/cdap/lib/coprocessors/table-4.1.1-1.1.0.jar

/cdap/lib/coprocessors/table-4.1.2-1.1.0.jar

The wrapper coprocessor will also be placed on hdfs, but the same jar can be used for all versions of CDAP:

/cdap/lib/coprocessors/base-1.1.0.jar

The wrapper coprocessor will be the one that each hbase table will be configured to use. When it starts up, it will read the CDAP version from a predefined table, download the required coprocessor jar, create a classloader from it, and instantiate the actual coprocessor class. This change is completely transparent to cdap users and cluster administrators. 

Rolling HBase upgrade

Rolling HBase upgrade will be considered an advanced configuration that requires additional work from the cluster administartor. We will add a configuration setting 'master.manage.coprocessors' that defaults to 'true'. When true, CDAP handles coprocessors the same as before and cluster administrators don't have to do any additional work. However, it also means there will be downtime when upgrading CDAP or HBase. When set to false, when CDAP creates HBase Table, it will only specify the wrapper coprocessor classname and priority, but not the hdfs path. Instead of placing the wrapper coprocessor jar on hdfs, the CDAP wrapper coprocessor jar must be installed on every HBase node and included in the HBase classpath. 

In order to upgrade HBase in a rolling fashion, cluster administrators must install the new CDAP wrapper coprocessor on the node to be upgraded and restart the regionserver. 

Both

Since the change to support rolling cdap upgrades is internal to cdap, the work to support both is the same as the work to support just rolling HBase upgrade.

API changes

No changes to programmatic APIs

New REST APIs

No REST API changes

CLI Impact or Changes

  • None

UI Impact or Changes

  • None

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test ID

Test Description

Expected Results

Test ID

Test Description

Expected Results

1

Run an app that uses all coprocessor features (readless increments, etc) on CDAP 3.5.2. Perform a rolling upgrade without stopping the app.

Table contents are as expected

2

Run an app that uses all coprocess features on CDAP 4.1.0. Perform a rolling upgrade of HBase to another supported version without stopping the app.

Table contents are as expected

 

 

 

 

 

 

Releases

Release 4.1.0

Related Work

  • Work #1

  • Work #2

  • Work #3

Future work

Created in 2020 by Google Inc.