Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Data Pipelines are represented by a series of stages arranged in a Directed Acylic Graph (DAG). This forms a one-way pipeline. Stages, which are the "nodes" in the pipeline graph, can be broadly categorized into six categories:

  • Sources

  • TransformsTransformations

  • Analytics

  • Actions

  • Sinks

  • Error Handling

Sources are databases, files, or real-time streams from which you obtain your data. They enable you to ingest data, using a simple UI so you don't have to worry about coding low-level connections.

TransformsTransformations allow you to manipulate data once you have ingested it. For example, you may clone a record. You can format JSON. You can even write custom transforms using the Javascript plugin.

...

Finally, data must be written to a Sink. Sinks come in a wide variety of formats (Avro, Parquet, or a RMBDS, for example) and the connection is created from a simple UI. Data written to these sinks can then be queried from the CDAP UI or using a MicroservicesCDAP Microservice.

If a plugin you need does not exist, you may might want to build you own plugin as described in the Developer Documentation.

...