CDAP Components and Functional Responsibilities
Infrastructure components used by Cask Data Application Platform (CDAP)
Following are the underlying infrastructure components used by CDAP and/or CDAP Applications running in CDAP. Â The components presented below are in no priority order.Â
- HDFS
- HBase
- Hive
- Kafka
- YARN and
- Zookeeper
- KMS
- Sentry ???
Functional use of infrastructure components
This section provides information about how and for what the components underneath are used.Â
HDFS
- CDAP Stream
- Apache Tephra WAL
- Deployed Application Artifact and Dataset Artifact
- Aggregated Logs
- CDAP Fileset Dataset
- YARN distributed cacheÂ
- Coprocessor jarsÂ
HBase
- CDAP System data/metadata (ex: Preferences, Application, Namespace, Artifact…)
- Metrics Cube
- Lineage
- Workflow Statistics
- Run Record and Statistics
- Checkpoint information
- CDAP Table Dataset
Kafka
- Logs
- Metrics
- Audit Logs (Will be moved to HBase in 4.0)
- Metadata updates (Will be moved to HBase in 4.0)
- Notifications (Will be moved to HBase in 4.x)
YARN
- System Services
- User applications
Zookeeper
- Routing Tables
- Coordination
- Secret keysÂ
- Auth keys
Hive
- Dataset integrationÂ
- Schema
- Properties
- Serde
KMS
- User Secrets (Ex: Password, access tokens etc..)Â
Created in 2020 by Google Inc.