CDAP Release 6.9.1

Release date: June 6, 2023

Features

PLUGIN-1537: CDAP supports the following improvements and changes for real-time pipelines with a single Pub/Sub streaming source and no Windower plugins:

  • The Pub/Sub streaming source has built-in support—data is processed at least once. Enabling Spark checkpointing isn’t required.

  • Pub/Sub streaming source creates a Pub/Sub snapshot at the beginning of each batch and removes it at the end of each batch.

  • The Pub/Sub Snapshot creation has a cost associated with it. For more information, see Pub/Sub pricing.

  • Snapshot creation can be monitored using Cloud Audit logs.

CDAP-20543: CDAP version 6.9.1 supports the Dataproc image 2.1 compute engine, which runs in Java11. If you change the Dataproc image to 2.1, the JDBC drivers that the database plugins use in those instances must be compatible with Java11.

CDAP-20228: CDAP supports source control management with GitHub.

Improvements

CDAP-20436: Added the ability to aggregate pipeline metrics in the RuntimeClientService by setting app.program.runtime.monitor.metrics.aggregation.enabled to true in cdap-site.xml. This slightly increases the resource usage of the RuntimeClientService but decreases the load on the CDAP metrics service. The scalability of the metrics service increases with the number of spark executors per pipeline.

CDAP-20455: Streaming pipelines that use Spark checkpointing can use macros if the cdap.streaming.allow.source.macros runtime argument is set to true. Note that macro evaluation will only be performed for the first run in this case, then stored in the checkpoint. It won't be reevaluated in later runs.

CDAP-20466: Added Lifecycle microservices endpoint to delete a streaming application state for Kafka Consumer Streaming and Google Cloud Pub/Sub Streaming sources.

CDAP-20488: Improved performance of replication pipelines by caching schema objects for data events.

CDAP-20500: Added a launch mode setting to the Dataproc provisioners. When set to Client mode, the program launcher runs in the Dataproc job itself, instead of as a separate YARN application. This reduces the start-up time and cluster resources required, but may cause failures if the launcher needs more memory, such as if there is an action plugin that loads data into memory.

CDAP-20504: Removed duplicate backend calls when a program reads from the secure store.

CDAP-20567: Added support to upgrade Pipeline Post-run Action (Pipeline Alerts) plugins during the pipeline upgrade process.

Fixed

CDAP-18394: Fixed an issue which checks GET permission on a namespace that doesn't exist yet during the namespace creation flow.

CDAP-20216: Fixed an issue where Dataproc continued running a job when it couldn't communicate with the CDAP instance, if the replication job or pipeline was deleted in CDAP.

CDAP-20568: Fixed an issue that caused pipelines with triggers with runtime arguments to fail after the instance was upgraded to CDAP 6.8+ and 6.9.0.

CDAP-20597: Fixed an issue where arguments set by actions and pipeline triggers don't overwrite runtime arguments. Users must add the following runtime argument: system.skip.normal.macro.evaluation=true.

CDAP-20655: Fixed an issue that caused the Pipeline Studio page to show an incorrect count of triggers.

CDAP-20660: Fixed an issue that caused the Trigger's Payload Config to be missing in the UI for an upgraded instance.

PLUGIN-1594: Fixed an issue where initial offset was not considered in the Kafka batch source.

Deprecated

CDAP-20667: All datasets except FileSet and ExternalDataset are deprecated and will be removed in a future release. All the deprecated datasets use the Table dataset in some form, which only works for programs running with the native provisioner on very old Hadoop releases.

 

Created in 2020 by Google Inc.