Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

Prior to CDAP 6.0.0, extensions were often added as CDAP applications. Data Prep, Analytics, and Reports were all implemented as CDAP applications, with Data Prep and Analytics running in each namespace they are required in, and Reports running in the system namespace. Running an application in each namespace wastes resources, so it is desirable to move Data Prep and Analytics into the system namespace. However, each of them have namespaced entities. When moved to the system namespace, they both need some sort of namespacing capabilities. The Reports application also is contain part of its logic in the application, but part in the CDAP system itself, due to requirements around accessing system information (run records). In order to cleanly implement these extensions as applications, additional functionality must be provided for system applications. 

Goals

Manage a single Data Prep, Analytics, and Reports application for use across all namespaces. 

User Stories 

  1. As a CDAP admin, I want to manage a single system application and not an application per namespace
  2. As a CDAP admin, I want to be able to dynamically scale the Services run by system applications
  3. As a CDAP admin, I do not want users to be able to create Analytics experiments in a namespace that does not exist
  4. As a CDAP admin, I want system applications to use the same backend storage system as the CDAP platform
  5. As a CDAP user, I want Analytics experiments and models to be local to a namespace
  6. As a CDAP user, I want Analytics experiments and models in a namespace to be deleted when the namespace is deleted
  7. As a CDAP system developer, I want to be able to receive notifications of system events (program lifecycle events, entity creation and deletion, etc.)
  8. As a CDAP system developer, I want my namespaced resources to be deleted when the namespace is deleted
  9. As a CDAP system developer, I want to be able to instantiate user scope plugins in a system application
  10. As a CDAP system developer, I want system applications to have the same privileges as the CDAP system
  11. As a CDAP system developer, I want to use the same storage SPI as the CDAP system code

Design

This design will focus on the needs that have been brought up by the Data Prep, Analytics, and Reports applications. Future system applications may require additional functionality, but that is out of scope of this design.

All system apps will be moved to run in the system namespace instead of having one application per namespace. The system apps also need to be changed to be namespace-aware, meaning they need to explicitly take a namespace in their requests and store data in such a way that namespace isolation is achieved. Reports is cross namespace by nature and does not need to worry about namespace isolation.

The following entities need to be namespaced:

AppEntity
DataPrepconnection
DataPrepworkspace
Analyticsexperiment
Analyticssplit
Analyticsmodel


The introduction of a namespace concept to application specific entities (connections, experiments, etc) is explicitly handled by each system app. The system apps are responsible for managing namespaces themselves, using a data model that meets their needs. CDAP will be extended to ensure apps have the required capabilities to do this.

REST

Namespace needs to be added as a prefix to all Data Prep and Analytics endpoints that manage namespaced entities. We will call it a 'context' to avoid confusion with the application's namespace. For example, Data Prep connection and workspace endpoints will all be prefixed by:

No Format
/v2/contexts/<context>/connections
/v2/contexts/<context>/workspaces

Similarly, all Analytics endpoints will be prefixed by:

No Format
/v2/contexts/<context>/experiments

The full API would look like:

No Format
GET /v3/namespaces/system/apps/dataprep/services/service/methods/v2/contexts/default/connections
GET /v3/namespaces/system/apps/ModelManagementApp/spark/ModelManagerService/methods/v2/contexts/default/experiments

Data Model

System datasets have a usage pattern where they use a Table based dataset as a metadata and entity store for CRUD operations, and a FileSet based dataset as a blob store. The dataset types used before 6.0.0 are:

AppTypeUsage
DataPrepTableConnection entities
DataPrepTableRecipe entities
DataPrepFileSetUsed for File connections to browse the filesystem and read files
DataPrepcustom WorkspaceDatasetWorkspace entities and metadata. Just a thin wrapper around a Table
ReportsFileSetReport files
AnalyticsIndexedTableExperiment entities and metadata
AnalyticsIndexedTableModel entities and metadata
AnalyticsFileSetTrained model files
AnalyticsPartitionedFileSetData splits and metadata

System applications will require these dataset types, or dataset types that are comparable in functionality. Reports also has a hidden requirement for listening to system events, and more specifically program state changes. It is hidden today because the platform itself listens to those events and writes to a separate Table that the app eventually scans. In order to move everything into the app, the app should have the ability to listen to those events. This can be done through the MessageFetcher interface that is already exposed to apps, with the system namespace passed as an argument.

Accessing System Tables

Two possible options are considered. The first approach re-uses the existing program types and adds additional capabilities that can only be used when the application is running in the system namespace. The second approach introduces a new program types that mirror the existing types except with additional system capabilities.

In both approach, CDAP will need to localize At a high level, a new type of service is introduced that provides methods to create and use StructuredTables. CDAP will localize the SPI implementation jars along with the container for any system app. If the SPI implementations remain in data-fabric, there may not be additional work required. But if they are moved into their own extension directory like the other SPIs, there will be additional work to localize the jars.

In both approaches, several SPI classes will need to be accessible to applications. This means the SPI classes will need to be moved to their own The SPI definitions will be moved to a cdap-storage-spi module with minimal dependencies. Aside from CloseableIterator (which exists in cdap-api will need to depend on that spi module, or ), everything in the table SPI is self contained.

In addition, a new cdap-system-api will need to be introduced that depends on the spi module. This has a minor issue with the usage of CloseableIterator in the SPI, which currently exists in cdap-api. Alternatively, the SPI classes can just be moved directly into cdap-api. 

In both approaches, cluster admins will need to ensure that any node in their cluster is able to access the system tables. They cannot restrict access to just the CDAP master nodes.

Option 1 - Re-use Existing Program Types

Creating System Tables

In this option, the Service and Spark Configurers extend a new SystemTableConfigurer interface that allows applications to create system tables when the app is deployed.

app-api is introduced with the new system program classes. cdap-system-app-api will depend on both cdap-storage-spi and cdap-api.

Creating System Tables

The existing Service interface will be changed to allow a different type of configurer:

Code Block
public interface SystemTableConfigurerService<T {extends ServiceConfigurer> {
  ...
}

A new SystemServiceConfigurer is introduced that allows creating system tables on application deployment and adding system http handlers:

Code Block
public interface SystemServiceConfigurer extends ServiceConfigurer {
  /**
   * Create a system table that conforms to the given table specification when the
   * application is deployed. If the table already exists, nothing happens.
   *
   * @throws UnsupportedOperationException if the application is not a system application
   */
  createTable(StructuredTableSpecification tableSpecification);
}
If an application that is not deployed in the system namespace tries to create a system table, an exception is thrown and app deployment fails

}

Tables created in this manner will be prefixed with 'app_' in order to avoid conflicts with tables created by the CDAP platform.

Using System Tables

Relevant program type contexts will implement the TransactionRunner interface, allowing developers to write code like:

Code Block
@GET
@Path("entities/{entity}")
public void getEntity(HttpServiceRequest request, HttpServiceResponder responder,
                      @PathParam("entity") String entity) {
  getContext().execute(tableContext -> {
    StructuredTable table = tableContext.getTable(TABLE_ID);
    table.read(...);
    ...
  });
}

If the application is not running in the system namespace, the execute() call will throw an UnsupportedOperationException. 

Option 2 - Introduce New Program Types

In this option, new program types are added. System tables are only accessible through these new program types.

Creating System Tables

The existing Service interface will be changed to allow a different type of configurer:

Code Block
public interface Service<T extends ServiceConfigurer> {
  ...
}

A new SystemServiceConfigurer is introduced that allows creating system tables on application deployment and adding system http handlers:

Code Block
public interface SystemServiceConfigurer extends ServiceConfigurer {
  /**
   * Create a system table that conforms to the given table specification when the
   * application is deployed. If the table already exists, nothing happens.
   *
   * @throws UnsupportedOperationException if the application is not a system application
   */
  createTable(StructuredTableSpecification tableSpecification);


}

Using System Tables

A new SystemHttpServiceHandler is introduced that provides access to a SystemHttpServiceContext that extends TransactionRunner and HttpServiceContext. This allows users to execute system table transactions within the handler.

Code Block
public interface HttpServiceHandler<T extends HttpServiceContext> extends ProgramLifecycle<T> {
  ...
}


public interface SystemHttpServiceHandler extends HttpServiceHandler<SystemHttpServiceContext> {
  ...
}


public interface SystemHttpServiceContext extends HttpServiceContext, TransactionRunner {
  ...
}

This allows users to write code like:

Code Block
@GET
@Path("entities/{entity}")
public void getEntity(HttpServiceRequest request, HttpServiceResponder responder,
                      @PathParam("entity") String entity) {
  getContext().execute(tableContext -> {
    StructuredTable table = tableContext.getTable(TABLE_ID);
    table.read(...);
    ...
  });
}

Option Comparison

Introducing new program types is cleaner in terms of preventing misuse and setting expectations. If a user tries to deploy an app in the system namespace that uses a SystemServiceConfigurer, deployment can fail right away. If existing program types are used, it is possible to deploy an app in a user namespace and attempt to use the TransactionRunner api. In this case it would only fail at runtime versus failing at deployment time. Using existing program types may also add more confusion to application developers, as some methods can only successfully run in certain contexts.

The new program types are also more extensible in terms of adding new functionality that should only be available to system applications. This may be relevant for the Reports app in the future, moving logic from inside CDAP into the app.

However, new program types will be a bit more complex in terms of implementation and might not generalize as cleanly for the other non-service program types. 

All things considered, it seems to make more sense to introduce the a new type of Service.A new SystemHttpServiceHandler is introduced that provides access to a SystemHttpServiceContext that extends TransactionRunner and HttpServiceContext. This allows users to execute system table transactions within the handler.

Code Block
public interface HttpServiceHandler<T extends HttpServiceContext> extends ProgramLifecycle<T> {
  ...
}


public interface SystemHttpServiceHandler extends HttpServiceHandler<SystemHttpServiceContext> {
  ...
}


public interface SystemHttpServiceContext extends HttpServiceContext, TransactionRunner {
  ...
}

This allows users to write code like:

Code Block
@GET
@Path("entities/{entity}")
public void getEntity(HttpServiceRequest request, HttpServiceResponder responder,
                      @PathParam("entity") String entity) {
  getContext().execute(tableContext -> {
    StructuredTable table = tableContext.getTable(TABLE_ID);
    table.read(...);
    ...
  });
}

This may cause confusion with the existing Transactional.execute() method. Ideally, the SystemHttpServiceContext will leave out the Transactional interface. 

Unit Tests

System apps will need to be able to unit test their storage classes. TestBase A new SystemAppTestBase is extended to give added to cdap-system-app-api that gives access to a TransactionRunner and StructuredTableAdmin.

Code Block
public class SystemAppTestBase extends TestBase {


  public TransactionRunner getTransactionRunner();


  public StructuredTableAdmin getStructuredTableAdmin();


}

Approach 1

In approach 1, in order to achieve namespace isolation and automatic entity deletion when a namespace is deleted, system apps will create a dataset instance in each namespace it needs. For example, a connection in the 'default' namespace will be stored in a Table in the 'default' namespace. In this way, isolation and automatic deletion is handled by the platform; the app only needs to use the correct dataset.

This approach has the nice property that namespacing does not need to be considered in the application's data model. For example, the entity 'context' (namespace) does not need to be present in the backend table schema. Isolation and automatic deletion are handled by the platform. On the downside, it duplicates every backend table for each namespace, with each table likely containing a very small number of rows. This splits state and makes upgrade more difficult. It also requires the system table SPI to be namespaced, which is not a natural concept for tables within the CDAP system.

Programmatic APIs

In order to implement namespace logic, system apps need to be able to perform several operations that are not supported prior to CDAP 6.0.0.

  1. Check namespace existence – System apps need to be able to check that a namespace exists in CDAP before managing any custom entities in that namespace.

  2. Dataset admin operations in another namespace – System apps need to be create a dataset in a specific namespace if it doesn't already exist. DatasetManager methods are not namespace aware and currently only operate within the namespace of the application.

  3. Plugin operations in another namespace – DataPrep needs to be able to instantiate UDDs (User Defined Directives) in order to execute directive lists. This means system applications need to be able to instantiate plugins whose artifacts are user scoped in some namespace. To give a more concrete example, suppose a 'my-custom-directive' UDD is deployed as a user artifact in namespace 'default'. The Data Prep system application needs to be able to instantiate that directive even though the app is running in the 'system' namespace.

To support this, several existing interfaces are enhanced with namespaced versions of their existing methods. DatasetContext already has a way to get a dataset from another namespace, but DatasetManager cannot check existence of or create a dataset in another namespace. DatasetManager will need to be enhanced with namespaced versions of its current methods:

Code Block
public interface DatasetManager {
  boolean datasetExists(String name) throws DatasetManagementException;
  boolean datasetExists(String namespace, String name) throws DatasetManagementException;
  ...
  void createDataset(String name, String type, DatasetProperties properties) throws DatasetManagementException;
  void createDataset(String namespace, String name, String type, DatasetProperties properties) throws DatasetManagementException;
  ...
}

Similarly, the ArtifactManager interface available to Service programs must also be modified.

Code Block
public interface ArtifactManager {
  List<ArtifactInfo> listArtifacts() throws IOException;
  List<ArtifactInfo> listArtifacts(String namespace) throws IOException;  

  CloseableClassLoader createClassLoader(ArtifactInfo artifactInfo,
                                         @Nullable ClassLoader parentClassLoader) throws IOException;
}

The Admin interface will be enhanced to check for namespace existence:

Code Block
public interface Admin extends DatasetManager, SecureStoreManager, MessagingAdmin {
  boolean namespaceExists(String namespace);


  NamespaceMeta getNamespace(String namespace);
}

This has a side benefit of bringing more consistency to the APIs instead of having mixed APIs that sometimes allow cross namespace operations and sometimes don't.

System app service methods would typically look something like:

Code Block
@Path("/v2")
public class ModelManagerServiceHandler implements SparkHttpServiceHandler {

  @GET
  @Path("/context/{context}/experiments")
  public void listExperiments(HttpServiceRequest request, HttpServiceResponder responder,
                              @PathParam("context") String context) {
    Admin admin = getContext().getAdmin();
    if (!admin.namespaceExists(context)) {
      responder.sendStatus(404, "Namespace " + namespace + " not found.");
    }
    if (!admin.datasetExists(EXPERIMENTS_DATASET)) {
      admin.createDataset(context, EXPERIMENTS_DATASET, "table", EXPERIMENTS_DATASET_PROPERTIES);
    }
    getContext().execute(datasetContext -> {
      Table experiments = datasetContext.getDataset(context, EXPERIMENTS_DATASET);
      ...
      responder.sendJson(...);
      });
  }
  ...
}

With this approach, the developer is forced into writing quite a lot of repeat code to check if the namespace exists. For convenience, the platform could support some syntactic sugar to check namespace for the user if the endpoint is annotated with a new Namespaced annotation. For example, something like:

Code Block
@Path("/v2")
public class ModelManagerServiceHandler implements SparkHttpServiceHandler {

  @GET
  @Namespaced(namespace = "context")
  @Path("/contexts/{context}/experiments")
  public void listExperiments(HttpServiceRequest request, HttpServiceResponder responder,
                              @PathParam("context") String context) {
    Admin admin = getContext().getAdmin();
    if (!admin.datasetExists(EXPERIMENTS_DATASET)) {
      admin.createDataset(context, EXPERIMENTS_DATASET, "table", EXPERIMENTS_DATASET_PROPERTIES);
    }
    getContext().execute(datasetContext -> {
      Table experiments = datasetContext.getDataset(context, EXPERIMENTS_DATASET);
      ...
      responder.sendJson(...);
      });
  }
  ...
}

Note that there is still a lot of repeat code around creating the dataset if it doesn't exist.

Authorization

An application running in the system namespace will have the same privileges as the CDAP system. There will need to be some extra work around impersonation and authorization for all methods exposed in the context objects to ensure that the right user is being used. 

For example, when creating a dataset in a namespace, the CDAP system user should be authorized to create it, but the namespace user should actually own the dataset.

Upgrade

When upgrading from CDAP 5.1.x to 6.0.0, no additional work needs to be done regarding datasets, as they will already be in their respective namespaces. Old versions of the DataPrep and Analytics apps will remain in any namespace they were enabled in, so it should be documented that users should delete these apps. CDAP could also provide a tool to do this cleanup. 

For future upgrades (CDAP 6.0.x to 6.1.0 and beyond), every table in every namespace must be upgraded.

Approach 2

In the second approach, the system application maintain a set of backend table in the system namespace and builds namespacing into the table schema. Isolation between namespaces must be implemented by the application. In addition, the application needs to subscribe to system events and delete entities when it sees a namespace deletion event.

This approach has the benefit of matching the CDAP system storage architecture, where there is a single set of tables for all namespaces and not a table per namespace. Upgrade is easier in the future, as only a single set of tables must be upgraded. Authorization is also simpler, as everything happens in system scope and cross namespace authorization never comes into the picture. On the downside, the application needs to listen to namespace deletion events and delete relevant entities itself. This has to be done in a worker program that is always running, which uses one more container than before.

The service endpoints are straightforward and will look something like:

Code Block
@Path("/v2")
public class ModelManagerServiceHandler implements SparkHttpServiceHandler {

  @GET
  @Path("/contexts/{context}/experiments")
  public void listExperiments(HttpServiceRequest request, HttpServiceResponder responder,
                              @PathParam("context") String context) {
    Admin admin = getContext().getAdmin();
    if (!admin.namespaceExists(context)) {
      responder.sendStatus(404, "Namespace " + namespace + " not found.");
    }
    // use SPI to list experiments in context
  }
  ...
}

Namespacing

Each entity table will contain 'context' (namespace) as a column. The context will be the first part of the primary key. In this way, listing, get, delete, and update will always require context to be specified, ensuring entities don't get mixed across contexts.

Namespace Deletion

The application will need a worker to listen to namespace delete events. The worker will look something like:

Code Block
public static class SystemEventListener extends AbstractWorker {

  private volatile boolean stopped;

  @Override
  public void run() {
    while (!stopped) {
      String messageId = getLastMessageId();
      try (CloseableIterator<Message> iter = getContext().getMessageFetcher().fetch("system", "topic", 100, messageId)) {
        while (iter.hasNext()) {
          MetadataMessage event = GSON.fromJson(iter.next().getPayloadAsString(), MetadataMessage.class);
          EntityId deletedEntity = event.getEntityId();
          MetadataMessage.Type eventType = event.getType();
          if (eventType == MetadataMessage.Type.ENTITY_DELETION && deletedEntity.getType() == EntityType.NAMESPACE) {
            String context = deletedEntity.getNamespace();
            // delete entities from storage
            ...
          }
          saveMessageId(message.getId());
        }
      }
      TimeUnit.MILLISECONDS.sleep(200);
    }
  }

  @Override
  public void stop() {
    stopped = true;
  }


  private String getLastMessageId() {
    // lookup message ID from storage
  }


  private void saveMessageId() {
    // write message ID to storage
  }
}

In order to prevent race conditions when a namespace is deleted and immediately re-created, each entity will contain a creation time. The namespace metadata will also contain the time that it was created. Based on these times, the application can filter out entities that should not be returned, and actual deletion can be handled in the background.

Scheduling Logic

One main downside to the worker approach is that the worker will be doing nothing almost all of the time. It also needs to manage extra state about the topic offset and understand how to decode the messages.

A better way to handle this would be to allow applications to schedule some logic to be run whenever a system event occurs. Similar to how workflows can be triggered based on program lifecycle events, system applications would be able to trigger lambda functions based on system events. This will simplify the worker code by removing the state logic, removing the polling loop, and ideally removing the need to understand the message format. The code would look more like:

Code Block
public static class NamespaceDeleteListener extends SystemEventListener {

  @Override
  public void run(MetadataMessage event) {
    String context = event.getEntityId().getNamespace();
    // delete entities from storage
    ...
  }

}

This would involve introducing a new program type and APIs to schedule that type, which could look something like:

Code Block
addEventListener(new NamespaceDeleteListener());


schedule(buildSchedule("namespaceDelete", MetadataMessage.Type.ENTITY_DELETION, EntityType.NAMESPACE));


API changes

New Programmatic APIs

There will be new programmatic APIs to allow more cross namespace access.

Deprecated Programmatic APIs

None

New REST APIs

There are no new CDAP APIs, but almost every Data Prep and Analytics endpoint will be modified to include namespace in the path.

Deprecated REST API

None

CLI Impact or Changes

  • None

UI Impact or Changes

  • UI needs to be updated to use the new REST APIs.

Security Impact 

Care needs to be taken to ensure that system applications are authorized to access CDAP entities in other namespaces.

Impact on Infrastructure Outages 

None

Test Scenarios

Test IDTest DescriptionExpected Results












Releases

Release 6.0.0

Related Work

  • Work #1
  • Work #2
  • Work #3


Future work

...