CDAP Edge

Use case(s)

  • Edge side data collection and aggregation (de-centralized)
  • Edge side anomaly detection and notification
  • Edge side data cleansing and transformation
  • Collect data from local sensors, devices and systems
  • Transport aggregated data using MQTT, HTTP or TCP

Goals

  • CDAP Standalone used in a production deployment capacity for constrained environment
  • Light-weight with minimal capability to run on the edge for IoT type of applications
  • Run "All" CDAP Applications
  • Run in environment that is constrained by memory, disk and compute
  • Self-healing capabilities
  • Integrate with central CDAP
  • Remote update or upgrade capabilities

Area of Focus

  • Resiliency and reliability of CDAP Standalone (Was not built to run in a production like environment) with self healing capabilities
  • Reduce CDAP Standalone footprint
  • Customize required component and programs
  • Integration with central CDAP Management
  • Reduce run-time footprint ( Remove un-necessary components and sub-systems)

High Level Requirements

  • Support long running applications (CDAP Applications)
  • Support the ability to run in a constrained environment with 512 MB of Memory (Reducing the footprint)
  • Automatic clean-up or management of transient data, metadata 
  • Remove Kafka and Zookeeper dependencies 
  • Support the ability to run reasonable number of applications
  • Remove User Interface and harden REST API interface
  • Remove extension interfaces

Technical Breakdown

LITE-001 : Remove Kafka and Zookeeper dependencies

Kafka and Zookeeper adds extensive requirements to footprint of CDAP Standalone, so replacing them with some reliable messaging service would reduce the footprint. There is already an initiative on the way to remove these dependencies. Find out more about this here

LITE-002 : Remove Hive and it's dependencies

At the edge there are no Ad-hoc query requirements, so it's been requested to remove the Ad-hoc querying capabilities and it's dependencies to ensure light footprint of standalone on the edge. 

LITE-003 : Disable or Remove Audit Log Capabilities

There will be 1000s of deployment of CDAP Lite in the field, aggregating audit logs for individual instance is not required. 

LITE-004 : Remove User Interface or Replace with Operational Interface

In order to reduce the footprint, development user interfaces don't make sense. But, there should exist a  simple operational interface to monitor the instance on the edge. 

LITE-005 : Cleanup, Log Rotation

User applications and CDAP Standalone generates a large amounts of data and metadata. As of now there doesn't exist capabilities to periodically clean-up this data. The system should provide ability to manage the TTL of data or metadata being generated. 

LITE-006 : Self healing capability

CDAP Lite is deployed in the field. Instances are generally un-monitored or would not be able to support constant maintenance. In case of shutdown, restarts or crash, the system should be able to come back to normal operation as soon as possible. (Expectation from one customer is 4 mins).

LITE-007 : Fix memory leaks with in memory MR, Spark and User Apps. 

Currently when MapReduce and Spark Programs are executed constantly within Standalone, the standalone crashes. This is due to memory leaks within MR and Spark frameworks. The system provide isolation and seamless handle memory leaks either in the framework or the user applications deployed within CDAP

LITE-008 : Support ability to be managed remotely

CDAP Lite in the field should have the ability to be managed from a central CDAP instance. Should provide the capability to update to new version of CDAP Lite, seamlessly upgrade CDAP Applications running within CDAP Lite. Central CDAP Instance should also have the ability to gather metrics, logs or check on status of remote CDAP Lite instance running in the field.

LITE-009 : Disable deployment of Tracker App

Since the audit log capabilities will be disabled, we can disable deploying and running tracker app in standalone to reduce the footprint.

LITE-010 : Disable authorization service 

In order to reduce the footprint, the authorization service can be disabled. 

LITE-011 : Disable Stream Service 

Disabling Stream service if there is no requirement to push data to stream will reduce the footprint for standalone CDAP.

Open Items / Discussion points

 

Action Items

 

 

 

Created in 2020 by Google Inc.