Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
There is a long standing bug where plugins that expose the same classes run into classloading issues at runtime. This is most commonly seen with avro/parquet/orc classes in the various file based plugins.
In addition to functional breakage, these plugins all have duplicate code resulting in tech debt. Whenever a new format is added to one, it needs to be added to all the others. Whenever a bug is fixed in one, it must be fixed in all others.
Goals
To provide a pluggable framework for adding and modifying formats that removes both classloading and code duplication as issues.
User Stories
- As a pipeline developer, I want the same set of formats to be available across all file based sources and sinks
- As a pipeline developer, I want to be able to use any combination of plugins in my pipeline without taking format into consideration
- As a pipeline developer, I want to know as soon as possible if I have configured an invalid schema for my format
- As a pipeline developer, I want the format to tell me what schema should be used if it requires a specific schema
- As a pipeline developer, I want the format of my source or sink to be set as metadata on my dataset
- As a plugin developer, I want to be able to add a new format without modifying the code or widgets for plugins that use the format
- As a plugin developer, I want to be able to add a new format without any changes to CDAP platform or UI code
- As a plugin developer, I want to be able to write a format that requires additional configuration properties, like a custom delimiter
- As a plugin developer, I want to be able to provide documentation for formats
- As a CDAP administrator, I want to be able to control the set of formats that pipeline developers can use
Design
Cover details on assumptions made, design alternatives considered, high level design
Approach
Approach #1
Approach #2
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors | |
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3