Core Abstractions
Data Abstractions (Deprecated)
Datasets are abstractions on top of data, allowing you to access your data using higher-level abstractions and generic, reusable Java implementations of common data patterns instead of requiring you to manipulate data with low-level APIs.
Application Abstraction
Applications hide low-level details of individual programming paradigms and runtimes, while providing access to many useful and powerful services provided by CDAP such as distributed transactions, service discovery, and the ability to dynamically scale processing units.
Applications are abstracted away from the platform that runs the application. When you deploy and run the application into a specific installation of CDAP, the appropriate implementations of all services and program runtimes are injected by CDAP; the application does not need to change based on the environment. This allows you to develop applications in one environment, such as on your laptop using a CDAP Sandbox for testing, and then seamlessly deploy them in a different environment, such as your distributed staging cluster.
Data and Applications Combined
With your data represented in CDAP as datasets, you are able to process that data in real time or in batch using a program (MapReduce, Spark, Workflow) and you can serve data to external clients using a Service.
This diagram shows how the CDAP components relate in an Apache Hadoop installation:
Â
Â
Created in 2020 by Google Inc.