CDAP Components and Functional Responsibilities
Infrastructure components used by Cask Data Application Platform (CDAP)
Following are the underlying infrastructure components used by CDAP and/or CDAP Applications running in CDAP. The components presented below are in no priority order.
HDFS
HBase
Hive
Kafka
YARN and
Zookeeper
KMS
Sentry ???
Functional use of infrastructure components
This section provides information about how and for what the components underneath are used.
HDFS
CDAP Stream
Apache Tephra WAL
Deployed Application Artifact and Dataset Artifact
Aggregated Logs
CDAP Fileset Dataset
YARN distributed cache
Coprocessor jars
HBase
CDAP System data/metadata (ex: Preferences, Application, Namespace, Artifact…)
Metrics Cube
Lineage
Workflow Statistics
Run Record and Statistics
Checkpoint information
CDAP Table Dataset
Kafka
Logs
Metrics
Audit Logs (Will be moved to HBase in 4.0)
Metadata updates (Will be moved to HBase in 4.0)
Notifications (Will be moved to HBase in 4.x)
YARN
System Services
User applications
Zookeeper
Routing Tables
Coordination
Secret keys
Auth keys
Hive
Dataset integration
Schema
Properties
Serde
KMS
User Secrets (Ex: Password, access tokens etc..)