CDAP-18738: Dataproc Cluster Reuse. Runtime property system.profile.properties.clusterReuseEnabled is no longer required to enable cluster reuse. Default Max Idle Time is set to 30 minutes to prevent accidental cluster leak.
CDAP-18725: Added more details for pipeline success and failure metrics.
CDAP-18712: Added ability to limit published lineage messages to a configurable size to avoid out of memory errors due to large lineages.
CDAP-18725: Added new tags (Provisioner, Cluster Status, Existing Status) to existing program failure/success metric.
CDAP-17772: Added authn/z between internal system services via token verification.
Instance Stability and Memory Usage
CDAP-18696: Added new Applications parameter (app.max.concurrent.launching) to cdap-default.xml control back pressure on pipeline starting requests. Requests exceeding the limit will fail with 429 (Too Many Requests) status.
CDAP-18712: Added new Metadata parameter (metadata.messaging.publish.size.limit) to cdap-default.xml to limit the size of published lineage messages to avoid out of memory errors due to large lineages.
CDAP-18672: Added new Dataset parameter (data.storage.sql.scan.size.rows) to cdap-default.xml to set the number of rows fetched for database reads from PostgreSQL.
CDAP-18559, CDAP-17986: Added retries to Dataproc API calls to ensure transient errors don’t affect cluster provisioning.
CDAP-18594, CDAP-18810: Fixed a problem when pipeline could not be deleted due to program state not updated after retries.
CDAP-18857: Added new Applications parameter (app.artifact.parallelism.max) to cdap-default.xml that limits artifact repository initialization parallelism to prevent Out of Memory errors on App Fabric startup.
CDAP-18848: Reduced Metrics parameter (metrics.processor.queue.size) parameter default from 20000 to 1000 to prevent Out of Memory during metric processing.
CDAP-18586: Prevented App Fabric Out Of Memory when system argument list is too long.
PLUGIN-1035: Fixed an issue that caused pipelines to fail when a Database batch source included a decimal column with precision greater than 19.
PLUGIN-1022: Fixed an issue that caused pipelines with a Conditional plugin and running on MapReduce to fail.
PLUGIN-1015: Fixed an issue that caused pipelines with a Conditional plugin and running on Spark to fail.
PLUGIN-974: Fixed an issue that caused validation to fail for GCS Multi File sinks.
CDAP-18586: getApplicationSpecification() method in interface io.cdap.cdap.api.schedule.ProgramStatusTriggerInfo has been removed in CDAP 6.6.0, which can cause the CDAP build break if you are using this method.
SQL Server Replication Source
CDAP-19354: The default setting for the snapshot transaction isolation level (snapshot.isolation.mode) is repeatable_read, which locks the source table until the initial snapshot completes. If the initial snapshot takes a long time, this can block other queries.
In case transaction isolation level doesn't work or is not enabled on the SQL Server instance, follow these steps:
Configure SQL Server with one of the following transaction isolation levels:
In most cases, set snapshot.isolation.mode to snapshot.
If schema modification will not happen during the initial snapshot, set snapshot.isolation.mode to read_committed.
2. After SQL Server is configured, pass a Debezium argument to the Replication job. To pass a Debezium argument to a Replication job in CDAP, specify a runtime argument prefixed with source.connector, for example, set the Key to source.connector.snapshot.isolation.mode and the Value to snapshot.