CDAP Release 6.6.0

Release Date: February 24, 2022

New Features

CDAP-18653: Added one-click autoscaling for Dataproc compute profiles.

Enhancements

PLUGIN-994: Added support for Fetch Size to the following plugins with the new limit of 1000 rows:

CDAP-18738: Dataproc Cluster Reuse. Runtime property system.profile.properties.clusterReuseEnabled is no longer required to enable cluster reuse. Default Max Idle Time is set to 30 minutes to prevent accidental cluster leak.

CDAP-18725: Added more details for pipeline success and failure metrics.

CDAP-18712: Added ability to limit published lineage messages to a configurable size to avoid out of memory errors due to large lineages.

CDAP-18651: Preview runners no longer perform any kind of access enforcement.

CDAP-18647: Added new limit of 5000 records for Previewing data in the Pipeline Studio.

CDAP-18621: Added new default value of 30 minutes for the Dataproc profile Max Idle Time property. Previously, Max Idle Time had no default value.

CDAP-18836: Added temporary namespace UPDATE enforcement for pipeline connections. 

CDAP-18798: Added system.program.starting.delay.seconds metric to measure time taken by program to transition from provisioning to running state.

CDAP-18714: Added metrics for API call latency.

CDAP-18725: Added new tags (Provisioner, Cluster Status, Existing Status) to existing program failure/success metric.

CDAP-17772: Added authn/z between internal system services via token verification.

Instance Stability and Memory Usage

CDAP-18696: Added new Applications parameter (app.max.concurrent.launching) to cdap-default.xml control back pressure on pipeline starting requests. Requests exceeding the limit will fail with 429 (Too Many Requests) status.

CDAP-18712: Added new Metadata parameter (metadata.messaging.publish.size.limit) to cdap-default.xml to limit the size of published lineage messages to avoid out of memory errors due to large lineages.

CDAP-18672: Added new Dataset parameter (data.storage.sql.scan.size.rows) to cdap-default.xml to set the number of rows fetched for database reads from PostgreSQL.

CDAP-18559, CDAP-17986: Added retries to Dataproc API calls to ensure transient errors don’t affect cluster provisioning.

CDAP-18594, CDAP-18810: Fixed a problem when pipeline could not be deleted due to program state not updated after retries.

CDAP-18857: Added new Applications parameter (app.artifact.parallelism.max) to cdap-default.xml that limits artifact repository initialization parallelism to prevent Out of Memory errors on App Fabric startup.

CDAP-18848: Reduced Metrics parameter (metrics.processor.queue.size) parameter default from 20000 to 1000 to prevent Out of Memory during metric processing.

CDAP-18791, CDAP-18627, CDAP-18553: Improved LevelDB performance and memory usage.

CDAP-18748, CDAP-18737, CDAP-18685, CDAP-18680: Improved running pipelines handling during App Fabric restarts.

CDAP-18656: Prevented App Fabric Out Of Memory error when it’s asked to retrieve a long list of pipelines within a namespace.

CDAP-18603: Added pagination to Lifecycle Microservices List Applications.

CDAP-18586: Prevented App Fabric Out Of Memory when system argument list is too long.

Bug Fixes

PLUGIN-1035: Fixed an issue that caused pipelines to fail when a Database batch source included a decimal column with precision greater than 19.

PLUGIN-1022: Fixed an issue that caused pipelines with a Conditional plugin and running on MapReduce to fail.

PLUGIN-1015: Fixed an issue that caused pipelines with a Conditional plugin and running on Spark to fail.

PLUGIN-974: Fixed an issue that caused validation to fail for GCS Multi File sinks.

Behavior Changes  

CDAP-18586: getApplicationSpecification() method in interface io.cdap.cdap.api.schedule.ProgramStatusTriggerInfo has been removed in CDAP 6.6.0, which can cause the CDAP build break if you are using this method.

Known Issues

SQL Server Replication Source

CDAP-19354: The default setting for the snapshot transaction isolation level (snapshot.isolation.mode) is repeatable_read, which locks the source table until the initial snapshot completes. If the initial snapshot takes a long time, this can block other queries. 

In case transaction isolation level doesn't work or is not enabled on the SQL Server instance, follow these steps:

  1. Configure SQL Server with one of the following transaction isolation levels:

  • In most cases, set snapshot.isolation.mode to snapshot.

  • If schema modification will not happen during the initial snapshot, set snapshot.isolation.mode to read_committed.

For more information, see Enable the snapshot transaction isolation level in SQL Server 2005 Analysis Services.

2. After SQL Server is configured, pass a Debezium argument to the Replication job. To pass a Debezium argument to a Replication job in CDAP, specify a runtime argument prefixed with source.connector, for example, set the Key to source.connector.snapshot.isolation.mode and the Value to snapshot.

For more information about setting a Debezium property, see Pass a Debezium argument to a Replication job.

Created in 2020 by Google Inc.