Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Data Pipelines are represented by a series of stages arranged in a Directed Acylic Graph (DAG). This forms a one-way pipeline. Stages, which are the "nodes" in the pipeline graph, can be broadly categorized into six categories:

  • Sources

  • TransformsTransformations

  • Analytics

  • Actions

  • Sinks

  • Error Handling

Sources are databases, files, or real-time streams from which you obtain your data. They enable you to ingest data, using a simple UI so you don't have to worry about coding low-level connections.

TransformsTransformations allow you to manipulate data once you have ingested it. For example, you may clone a record. You can format JSON. You can even write custom transforms using the Javascript plugin.

...

Finally, data must be written to a Sink. Sinks come in a wide variety of formats -- (Avro, Parquet, or a RMBDS, for example -- ) and the connection is created from a simple UI. Data written to these sinks can then be queried from the CDAP UI or using a MicroservicesCDAP Microservice.

If a plugin you need does not exist, you may might want to build you own plugin as described in the Developer Documentation.

...

For more information about data pipeline architecture, see How CDAP Data Pipelines Work.

We also have a Data Pipelines How To Guide. Check it out!