Reading from and writing to SSL enabled Apache Kafka

CDAP can connect with an SSL enabled Kafka instance by passing additional Kafka configuration properties in the CDAP data pipeline properties. 

Before you begin

Enable SSL for all brokers on the Apache Kafka server. For information, see Configuring Kafka Brokers in the Apache Kafka documentation. 

Step 1. Upload the Kafka SSL truststore and keystore files to the Dataproc cluster

For a successful SSL connection, the SSL truststore and keystore files must be present on every node of the Dataproc cluster that will run the pipeline, which is usually done using Dataproc Initialization Action.

To upload the Kafka truststore and keystore files to Dataproc, follow these steps:

  1. Copy the Kafka truststore and keystore files to a Cloud Storage bucket in the same project where you’ll run the pipeline.

  2. Create a Dataproc initialization action. Create a shell script and upload it to Cloud Storage. Dataproc will run it during cluster initialization. For example in this script, you can use the gsutil command to upload the SSL truststore and keystore files to a path on the Dataproc cluster from the Cloud Storage bucket. 

The following example shows a shell script named dataproc-init.sh that copies files from the Cloud Storage bucket into the folder /usr/ on the ephemeral Dataproc cluster.

#!/bin/bash sudo gsutil cp gs://testkafkassl/servernew.keystore  /usr/ sudo gsutil cp gs://testkafkassl/clientnew.truststore  /usr/

Step 2. Add additional configuration properties in the CDAP data pipeline

In Pipeline Studio, create a batch or streaming pipeline with a Kafka plugin (Kafka Consumer Batch Source, Kafka Consumer Streaming Source, or Kafka Producer Sink).

Configure the plugin to access the Kafka truststore and keystore files stored on each node of the Dataproc cluster.

In the Kafka plugin, add the following additional Kafka properties in the Additional Kafka Consumer Properties or Additional Kafka Producer Properties section:

  • security.protocol=SSL

  • ssl.truststore.location=/usr/client.truststore.jks;

  • ssl.truststore.password=test1234

  • ssl.keystore.location=/usr/kafka.client.keystore.jks

  • ssl.keystore.password=test1234

  • ssl.key.password=test1234

Important: The path of truststore file and keystore file location must be the path in the ephemeral Dataproc cluster where these files will get uploaded during provisioning. (See Step 1).

Step 3. Configure the CDAP Compute Profile to Run the Shell Script

In CDAP, after you deploy the pipeline, configure the Compute Profile to use the initialization action that you created. 

To configure the compute profile, follow these steps:

  1. Open the deployed pipeline.

  2. On the Deployed Pipeline page, click Configure.

  3. Select the Dataproc compute profile that you want to run the pipeline.

  4. Click Customize > Advanced Settings.

  5. In the Initialization Actions field, add the path and name of the initialization action that you created.

After adding the initialization action, you can see the Runtime Arguments under the Run profile of the pipeline. 

Step 4. Run the pipeline

Created in 2020 by Google Inc.