Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

An application template is a user-defined, reusable, reconfigurable pattern of an application. It is parameterized by a configuration that can be reconfigured upon deployment. It provides a generic version of an application which can be repurposed, instead of requiring the ongoing creation of specialized applications. The re-configurability and modularization of the application is exposed through plugins. CDAP provides its own, system-defined application templates, though new user-defined ones can be added that can use the DAG interface of the Pipeline Studio. The application templates are configured using the CDAP Pipeline Studio and deployed as applications into a Hadoop cluster.

...

  • A definition of the different processing supported by the template. These can include MapReduce, Service, Spark, Spark Streaming, Worker, and Workflow. In the case of a CDAP data pipeline, it (currently) can include MapReduce, Spark, Worker, and Workflow.

  • A planner is optional; however, CDAP includes a planner that translates a logical pipeline into a physical pipeline and pieces together all of the processing components supported by the template.

...

plugin is a customizable module, exposed and used by an application template. It simplifies adding new features or extending the capability of an application. Plugin implementations are based on interfaces exposed by the application templates. Currently, CDAP data pipeline application templates expose Source, Transform, and Sink interfaces, which have multiple implementations.

...

The Batch Data Pipeline and Realtime Data Pipeline application templates expose three plugin types: source, transform, and sink. The Batch Data Pipeline application template exposes three additional plugin types: aggregate, compute, and model, etc.

...

Internals of CDAP Data Pipelines

Building a Data Pipeline

Here is how the Pipeline Studio works with CDAP to build a data pipeline, beginning with a user creating a new data pipeline in the Pipeline Studio. First, the components of the Pipeline Studio:

...

  • User Selects an Application Template

    A user building a pipeline within the Pipeline Studio will select a pipeline type, which is essentially picking an application template.

  • Retrieve the Plugins types supported by the selected Application Template

    Once a user has selected an application template, the Pipeline Studio makes a request to CDAP for the different plugin types supported by the application template. In the case of the Batch Data Pipeline, CDAP will return Source, Transform, and Sink as plugin types. This allows the Pipeline Studio to construct the selection drawer plugin palette in the left sidebar of the UI.

  • Retrieve the Plugin definitions for each Plugin type

    Pipeline Studio then makes a request to CDAP for each plugin type, requesting all plugin implementations available for each plugin type.

  • User Builds the CDAP Pipelinedata pipeline

    The user then uses the Pipeline Studio canvas to create a data pipeline with the available plugins.

  • Validation of the CDAP Pipelinedata pipeline

    The user can request at any point that the pipeline be validated. This request is translated into a Microservices call to CDAP, which is then passed to the application template, which validates whether the pipeline is valid.

  • Application Template Configuration Generation

    As the user is building a pipeline, the Pipeline Studio is building a JSON configuration that, when completed, will be passed to the application template to configure and create an application that is deployed to the cluster.

  • Converting a logical into a physical Pipeline data pipeline and registering the Application

    When the user publishes the pipeline, the configuration generated by the Pipeline Studio is passed to the application template as part of the creation of the Application. The application template takes the configuration, passes it through a planner to create a physical layout, appropriately generates an application specification and registers the specification with CDAP as an application.

  • Managing the physical Pipelinepipeline

    Once the application is registered with CDAP, the data pipeline is ready to be started. If it was scheduled, the schedule is ready to be enabled. The CDAP UI then uses the CDAP Microservices to manage the pipeline's lifecycle. The pipeline can be managed from CDAP through the CDAP UI, by using the CDAP CLI, or by using the Microservices.

  • Monitoring the physical Pipelinepipeline

    As Because CDAP pipelines are run as CDAP applications, their logs and metrics are aggregated by the CDAP system and available using Microservices.