Application Versioning Design

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Goals

Fundamental need that is driving this work is the need to have multiple versions of the same program (in this case the driving factor is Service) running so that the requests are still served while the earlier version is shutdown etc. This is required for zero downtime of the service and is required when apps are updated.

User Stories 

  • User is running v1.0.0 of analytics application

    • The application has a user service that receives events

  • User wants to upgrade to v2.0.0 of analytics application with minimal down-time

  • User also wants capabilities to send percentage of traffic of v2.0.0 version of the application before directing 100% traffic to it

  • User wants to rollback to v1.0.0 with minimal downtime

Design

The above requirement suggests the need for application versioning. That is, the same application (identified by its name aka app name) can have multiple versions. Once we have that, if we have a service HTTP endpoint - /v3/namespaces/<ns-id>/apps/<app-id>/services/<service-id>/methods/<method-name>, this endpoint can be used continuously while underneath the user can now deploy an upgraded version of the app and the users of the endpoints are oblivious about that change and still served by one of the service versions. And the old service can then be stopped and this whole process doesn't affect the uptime of the endpoint.

We want to introduce application versioning without breaking backward compatibility. And current scope of the work involves only looking at handling Services. MR/Workflow/Spark already support concurrent runs and thus concurrent runs of these across multiple versions of the app shouldn't be an issue. However, what should happen when we have multiple versions of Flows run etc is not fully understood and thus we will not change the current design choice of not allowing concurrent runs of Flows (even across multiple versions of the App).

Approach

Application versions are chosen and set while creating the app by the user. 'Version' is represented as a string. If users don't provide a version, then the default version ("-SNAPSHOT") is used. If the app is created from an artifact, then the artifact version from the AppRequest is used as the version of the Application. Once created, an app version cannot be changed, unless the version ends with the string "-SNAPSHOT". So if a user is not using versioning at all, the current behavior of updating an app will work fine (since by default -SNAPSHOT is the version). For versioned endpoints (/apps/app-id/versions/version-id/services/MyService/start), the corresponding version of the app is used. If the user is using the non-versioned API (/apps/app-id/services/MyService/start), we will check if there are more than one version and if there is, then we will return an error code. This is so that backward compatibility (pre-app-version era and users who don't want to use app versions) is maintained. The only exception to this rule is the service method routing endpoints, which will use the configuration set to route the service requests appropriately by the CDAP Router (more details about that in the next section).

Reasoning: 

In CDAP, relationship between application and artifact are not 1-to-1. Multiple applications can be created from the same artifact by providing different application configuration, hence using the artifact version as the application version doesn’t work very well. E.g. in Hydrator, multiple pipelines (pipeline == application) are created from the same artifact, the hydrator artifact. We are introducing new REST endpoint for deploying application so that version is provided explicitly. I think we can have that endpoint default to use the artifact version as the application if the app version is not provided explicitly and I believe that should fit your use case pretty well. About the default version “-SNAPSHOT”, it is mainly for backward compatibility purpose. That is, if one deploy an artifact+app using the existing endpoint, we use “-SNAPSHOT” as the version internally so that it can be overwritten on redeploy (non SNAPSHOT application versions are immutable).


Service Routing

Routing to services with multiple versions running concurrently will have an additional feature of controlling the distribution of the requests. Instead of being completely random, the user can choose to allow, say 80% of requests to be served by "2.0.0" version, 10% by "1.3.1" and 10% by "1.3.0". This will be made by possible by allowing the user to set a distribution strategy for a particular service (ns-name, app-name, service-name) and it will be used by CDAP Router when deciding the service instance to forward the request to. If the route configuration is missing, then the request will use the default behavior configured in CConf. It could be random, min, max (string comparison) or just fail to route.


Support for other Program Types (other than Services)

We support concurrent runs for Spark, MapReduce and Workflow and thus none of their functionality needs any changes/updates. For Flows and Workers, we don't support concurrent runs today and we will retain the same logic across all versions of the app. So say a worker named MyWorker of app ver1 is running and if the user tries to start the same worker of app ver 2, we will return back an error with CONFLICT.

 

API Changes

REST API changes

PathMethodDescriptionResponse CodeResponse
/apps/app-id/versions/version-id/create { 'App Request' }POSTCreate or update an application from an artifact (this is the only app creation endpoint that will
support versioning)
Note: The call needs to be a POST since we won't allow updating of app versions which are not SNAPSHOT. 

As mentioned earlier, version-id is simply string that is valid CDAP ID. Note that the /apps POST
endpoint that allows creation of apps from the JAR
in the body of the request, and thus will create apps with "-SNAPSHOT" string as their version.
/apps/app-id PUT endpoint will create or update
the app with "-SNAPSHOT" string as version from existing artifacts. with Note that version-id in
the above call cannot be empty. The user needs to specify a non-empty string if they want to use the versioning endpoint.

 
  • Application version creation - implementation:
    ApplicationId class will be modified to have another variable version id. The version of the
    app is stored
    as part of the key in the Store.
 

200 - On success

409 - Same application version already exists

500 - Any internal errors

 

/apps/app-id/versions/version-idDELETE

This endpoint will delete a particular version of the application. Same semantics as deleting
the application today - i.e., no programs of that particular app-version can be running.
If that is the only version of the app, then the app, implicitly, is deleted.

  • Application version deletion - implementation:
    Similar to the current application deletion, except that the key (composed of namespace, app,
    version id) is removed from the Store. 

200 - On success

404 - When application is not available

409 - The application version still has program running

500 - Any internal errors

 
/apps/app-idDELETE

For backward compatibility, this will delete the app with "-SNAPSHOT" string as version. 

same as above 
/apps/app-id/versionsGET

This endpoint will list all the versions different versions of the app that are present.

  • Implementation:
    Scan the Store table with key prefix - (namespace, app id) and fetch the version strings from
    the keys that are returned in that prefix scan.
 

200 - On success

404 - When application is not available

500 - Any internal errors

 List of versions in the format ["version1", "version2", ...]
/apps/app-id/versions/version-idGET

This endpoint will return the ApplicationDetail of an app version similar to the what is returned today for /apps/app-id.

200 - On success

404 - When application is not available

500 - Any internal errors

 
/apps/app-idGET

For backward compatibility, this will return ApplicationDetail of the app with "-SNAPSHOT" as version.

same as aboveApplicationDetail in JSON format
/apps/app-id/versions/version-id/program-type/program-id/start (or stop)POSTStart or stop a specific program in an app version

200 - On success

404 - When application or program is not available

409 - The program to start (stop) is already running (stopped)

500 - Any internal errors

 
/apps/app-id/program-type/program-id/start (or stop)POSTStart or stop a specific program in the app with "-SNAPSHOT" string as version.same as above 
/v3/namespaces/<namespace-id>/apps/<app-id>/services/<service-id>/routeconfig

{ 'Routing Config' }

PUT

Upload a load distribution configuration Routing Config, which is a JSON that whose structure looks as follows:

{ "version-id1":number1, "version-id2":number2, .... }

For example, { "v1":10, "v2":90 } . This config says that version v1 should get 10% of the requests and
version v2 should get 90% of all the service method requests addressing that particular service. The numbers should be integers. The numbers should add up to 100 exactly. Otherwise, a BadRequest code will be returned.


If the routing configuration is not present, the user can configure a default behavior in cdap-site.xml. This will be used for all services across all apps and namespaces. 

Property : 'cdap.service.http.routing.default'  Values = { none, random, smallest, greatest }
none -> don't route it to any service
random -> choose a random version of the service
smallest -> choose the version that is the smallest (based on the string comparison)
greatest -> choose the version that is the greatest (based on the string comparison)

200 - On success

400 - When application, service or app version is not available. Or the sum of percentages in RouteConfig is not 100

500 - Any internal errors

 
/v3/namespaces/<namespace-id>/apps/<app-id>/services/<service-id>/routeconfigDELETEDelete the Routing Config of a given service of an app version

200 - On success

404 - When application, service or app version is not available.

500 - Any internal errors

 
/v3/namespaces/<namespace-id>/apps/<app-id>/services/<service-id>/routeconfigGETGet the Routing Config of a given service of an app version

200 - On success

500 - Any internal errors

Routing Config in JSON, or empty if the Application or service not available

 

CLI Changes

 

CommandDescriptionResponse
create app <app-id> [version <app-version>] <artifact-name> <artifact-version>
 <scope> [<app-config-file>]
Create or update an application from an artifact (this is the only app creation command that will support versioning)
 <app-version> is simply string that is valid CDAP ID. If <app-version> is not given, it will create or update the app with
 "-SNAPSHOT" string as version from existing artifacts.

 

delete app <app-id> [version <app-version>]This endpoint will delete a particular version of the application if the <app-version> is given. Same semantics as deleting
the application today - i.e., no programs of that particular app-version can be running. If <app-version> is not given, it will
delete the app with
"-SNAPSHOT" string as version.
 
list app versions <app-id>This command will list all the versions different versions of the app that are present.A table of versions with one version in a row
describe app <app-id> [version <app-version>]This command will return the programs of an app version if <app-version> is given. If <app-version> is not given, return programs of the app with "-SNAPSHOT" string as version.A table of programs with type, id, and description in every row

start <program-type> <app-id.[app-version.]program-id> [<runtime-args>]

 

ALTERNATIVE:

start <program-type> <app-id.program-id> [version <app-version>] [<runtime-args>]

This command will start the program of an app version if <app-version> is given. If <app-version> is not given, start the program
of the app with
"-SNAPSHOT" string as version.
 

stop <program-type> <app-id.[app-version.]program-id>


 

ALTERNATIVE:

 

stop <program-type> <app-id.program-id> [version <app-version>]


This command will stop the program of an app version if <app-version> is given. If <app-version> is not given, stop the program
of the app with
"-SNAPSHOT" string as version.
 

set routeconfig <app-id.service-id> <route-config>


ALTERNATIVE:

set routeconfig <route-config> for service <app-id.service-id>

This command will configure service routing configuration.

The <route-config> follows the format:

  • JSON:  { "version-id1":number1, "version-id2":number2, .... }  (pro: consistent with REST API; con: inconsistent with existing CLI)
  • map with key-value separated by "="     'version-id1=number1, version-id2=number2, ...' 
    (pro: consistent with map in existing CLI; con: "=" doesn't make sense here, inconsistent with REST API)
  • map with key-value separated by ":"      'version-id1:number1, version-id2:number2, ...'  
    (pro: ":" makes more sense than "="; con: "=" requires changes in argument parsing, inconsistent with REST API)

 

 
get routeconfig <app-id.service-id> <route-config>Command to get service routing configurationRouting configuration in JSON format
delete routeconfig <app-id.service-id>Command to delete service routing configuration 

call service <app-id.[app-version.]service-id> <http-method> <endpoint> [headers <headers>] [body <body>] [body:file <local-file-path>]

 ALTERNATIVE:

call service <app-id.service-id> [version <app-version>] <http-method> <endpoint> [headers <headers>] [body <body>] [body:file <local-file-path>]

Call service of a specific version 


Authorization

We convert the CDAP's entity id to the ID used by Sentry. For simplicity, I propose that we do authorization simply at the application name level and not any more fine grained. So a user who has access X to that app, will get access X to all the versions of that app. This conversion can be done in the cdap-security-extn where we can check the permissions at that level?


Logs

We will have an additional logging systemTag, for application version id. With the new LogViewer, we don't explicitly provide the version number/program name etc but it can be returned as part of the JSON for that endpoint.


Metrics

We will have an appVersion tag for Metrics. So users can then query for metrics across all runs of a particular version of the program. Metrics for an app will not be deleted until all the versions of the app is deleted which is current behavior today. 


Preferences

No changes to Preferences. The hierarchy will be namespace -> app id -> program. There will be no app version dimension. Since we have runtime args, the users can use that if required.


Lineage
Audit
Metadata

TBD

Upgrade Step

Since the version used by default is "-SNAPSHOT", we will need to update the key in the Store (keys that start with appMeta) and add this default version to all the apps created. No other changes should be required.


UI Changes

We don't need any changes in the UI for 3.6.0. But for 4.0, we should display the application version along with artifact version when we show a specific Application. That is, artifact version should be displayed along with application version wherever the latter is planned to be shown.

Test Scenarios

Test IDTest DescriptionExpected Results
AppVersion1Deploy an app version with existing artifact and with runtime argsDeploy succeessfully
AppVersion2Start a service in an app version and call the service methodService method should response according to the given runtime args
AppVersion3Update the deployed app version with new runtime argsUpdate should succeed
AppVersion4Start the service of the updated app which has been started before updatingFail to start with error message saying that the same service has been started
AppVersion5Stop the same service then start again, and call the same methodStart succeeds and receive response from the method according to the updated runtime args
AppVersion6Deploy another app version with the same existing artifact and with different runtime argsDeploy successfully
AppVersion7Start the same service in the new app version and call the service method of both versionsBoth versions of the service should response according to their runtime args
AppVersion8Delete an app version without stopping the running serviceFail to delete with error message saying that the service is still running
AppVersion9Stop the service and delete the app versionDelete successfully
AppVersion10Deploy two non-snapshot versions of the app and start the same service in both versions. Call non-versioned service endpoint for no more than 50 timesBoth versions of the service should be reached for at least once within 50 calls to the non-versioned endpoint according to the random routing strategy without setting RouteConfig
AppVersion11Set RouteConfig for a non-existing versionFail to set with error message saying that the version doesn't exist
AppVersion12Set RouteConfig as 99 for the only existing versionFail to set with error message saying that the total percentage doesn't add up to 100
AppVersion13Set RouteConfig as 100 and 1 for the two existing versionsFail to set with error message saying that the total percentage doesn't add up to 100
AppVersion14Set RouteConfig as 98 and 1 for the two existing versionsFail to set with error message saying that the total percentage doesn't add up to 100
AppVersion15Set RouteConfig as 100 and 0 for the two existing version and call non-versioned service endpoint for 20 timesAll traffic is routed to the version with RouteConfig 100
AppVersion16Set RouteConfig as 10 and 90 for the two existing versions and get RouteConfigSet RouteConfig successfully with total percentage equal to 100 and get the same RouteConfig as set
AppVersion17Set RouteConfig as 20 and 80 for the two existing versions and get RouteConfigSet RouteConfig successfully with total percentage equal to 100 and get the same RouteConfig as set
AppVersion18Set RouteConfig as 60 and 40 for the two existing versions and get RouteConfigSet RouteConfig successfully with total percentage equal to 100 and get the same RouteConfig as set
AppVersion19Delete RouteConfig and call non-versioned service endpoint for no more than 50 timesDelete successfully and both versions of the service should be reached within 50 calls to the non-versioned endpoint according to the random routing strategy with empty RouteConfig
AppVersion20Delete the namespace while two versions of the app still have services runningFail to delete with error message saying that some programs are still running
AppVersion21Stop one version of the service and delete the namespace while one version of the service is runningFail to delete with error message saying that some programs are still running
AppVersion22Stop the only running service and delete the namespaceDelete successfully

Releases

Release 3.6.0 (Drop for 9/20)

  • (Internal) Change ApplicationId to contain application version (will be part of the key of Store)
  • Introduce REST API to create/delete Apps with non-default versions
  • Introduce REST API to start/stop programs with versions
  • Upgrade step to add default version to the keys of the Store 
  • Check to make sure non-versioned API fail if multiple versions are present (since we need to define some behavior, we might as well implement the check) - Work will be involved in adding it to all endpoints
  • Versions related endpoint - listing versions, getting app spec for a particular version (artifact id and app config etc)
  • Service endpoint routing - ability to upload a config to provide a forwarding distribution logic at the Router
  • Service endpoint routing when config is not present
     

Future Work

In scope for 4.0

  • CLI
  • Logging and Metrics support
  • Metadata
  • Lineage
  • Audit
  • Authorization (not required)

 

Created in 2020 by Google Inc.