Introduction
Confluent Cloud is a streaming data service with Apache Kafka delivered as a managed service.
User Storie(s)
- As a pipeline developer, I would like to stream data real time from Confluent cloud based on the specified schema by the user
- As a pipeline developer, I would like to stream data real time to Confluent Cloud and perform serialization during event streaming
- As a pipeline developer I would like to capture records that did not get delivered downstream to Confluent Cloud for analysis
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Prerequisites
- Kafka Broker: Confluent Platform 3.3.0 or above, or Kafka 0.11.0 or above
- Connect: Confluent Platform 3.3.0 or above, or Kafka 0.11.0 or above
- Java 1.8
Properties
Real time source
Property | description | Mandatory or Optional |
---|
referenceName | Uniquely identify the source | Yes |
Kafka cluster credential API key | API key in order to connect to the Confluent cluster | Yes |
Kafka cluster credential secret | Secret in order to connect to the Confluent cluster | Yes |
| The connect string location of ZooKeeper. Either that or the list of brokers is required | Required if brokers not specified |
Kafka brokers | Comma-separated list of Kafka brokers. Either that or the ZooKeeper quorum is required | Required if zookeeper not specified |
| Number of partitions | Yes |
| The initial offset for the partition |
|
Kafka topic | List of topics which we are listening to for streaming | Yes |
Schema registry URL | URL endpoint for the schema registry on Confluent Cloud or self hosted schema registry URL | No |
Schema registry API key | API key | No |
Schema registry secret | Secret | No |
Format | Specify the format for the Kafka event. Any supported format by CDAP is supported. Default output is key and Value as bytes | No |
Real time sink
Property | description | Type | Mandatory |
---|
Reference Name | Uniquely identify the sink | String | Yes |
Kafka cluster credential API key | API key in order to connect to the Confluent cluster | String | Yes |
Kafka cluster credential secret | Secret in order to connect to the Confluent cluster | String | Yes |
Kafka brokers | Comma-separated list of Kafka brokers | String | Yes |
Async | Specifies whether writing the events to broker is Asynchronous or Synchronous | Select | Yes |
| Specifies the input fields that need to be used to determine the partition id | Int or Long | Yes |
| Specifies the input field that should be used as the key for the event published into Kafka. | String | Yes |
Kafka topic | List of topics to which the data should be published to | String | Yes |
format | Specifies the format of the event published to Confluent cloud | String | Yes |
References
https://docs.confluent.io/current/connect/kafka-connect-bigquery/index.html
https://docs.confluent.io/current/cloud/connectors/cc-gcs-sink-connector.html#cc-gcs-connect-sink
https://docs.confluent.io/current/quickstart/cloud-quickstart/index.html
https://docs.cask.co/cdap/4.2.0/en/developer-manual/pipelines/plugins/sinks/kafkaproducer-realtimesink.html
https://docs.cask.co/cdap/4.2.0/en/developer-manual/pipelines/plugins/sources/kafka-realtimesource.html
https://docs.confluent.io/current/cloud/limits.html#cloud-limits
Design / Implementation Tips
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2