Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

There is a long standing bug where plugins that expose the same classes run into classloading issues at runtime. This is most commonly seen with avro/parquet/orc classes in the various file based plugins. 

Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.
 and  Unable to locate Jira server for this macro. It may be due to Application Link configuration.  are a couple examples. The problem arises when a pipeline uses plugins from two different artifacts that expose the same class. For example, File source from core-plugins that exposes AvroKey and GCS sink from google-plugins that also exposes AvroKey. Two separate classloaders end up defining the class, and pipelines fail with confusing errors like AvroKey cannot be cast to AvroKey.


In addition to functional breakage, these plugins all have duplicate code resulting in tech debt. Whenever a new format is added to one, it needs to be added to all the others. Whenever a bug is fixed in one, it must be fixed in all others.

Goals

To provide a pluggable framework for adding and modifying formats that removes both classloading and code duplication as issues. 

User Stories 

  1. As a pipeline developer, I want the same set of formats to be available across all file based sources and sinks
  2. As a pipeline developer, I want to be able to use any combination of plugins in my pipeline without taking format into consideration
  3. As a pipeline developer, I want to know as soon as possible if I have configured an invalid schema for my format
  4. As a pipeline developer, I want the format to tell me what schema should be used if it requires a specific schema
  5. As a pipeline developer, I want the format of my source or sink to be set as metadata on my dataset
  6. As a plugin developer, I want to be able to add a new format without modifying the code or widgets for plugins that use the format
  7. As a plugin developer, I want to be able to add a new format without any changes to CDAP platform or UI code
  8. As a plugin developer, I want to be able to write a format that requires additional configuration properties, like a custom delimiter
  9. As a plugin developer, I want to be able to provide documentation for formats
  10. As a CDAP administrator, I want to be able to control the set of formats that pipeline developers can use

Design

Cover details on assumptions made, design alternatives considered, high level design

Approach

Approach #1

Approach #2

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors







Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results












Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3


Future work

  • No labels