CDAP Release 6.9.0

Features

CDAP-20454: Added support for specifying filters in SQL in Wrangler and pushdown of SQL filters in Wrangler to BigQuery. In the Wrangler transformation, added support for specifying preconditions in SQL, and added support for transformation pushdown for SQL preconditions.

CDAP-20440: For the Multiple Database Tables Batch Source, added field-level lineage support.

CDAP-20288: Added support for Dataproc driver node groups. To use Dataproc driver node groups, when you create the Dataproc cluster, configure the following properties: 

  • yarn:yarn.nodemanager.resource.memory.enforced=false

  • yarn:yarn.nodemanager.admin-env.SPARK_HOME=$SPARK_HOME

Note: The single quotation marks are important in the property when using gcloud CLI to create the cluster ('yarn:yarn.nodemanager.admin-env.SPARK_HOME=$SPARK_HOME') so that the shell doesn't try to resolve the $ locally before submitting.

CDAP-19628: Added support for Window Aggregation operations in Transformation Pushdown to reduce the pipeline execution time by performing SQL operations in BigQuery instead of Spark.

CDAP-19425: Added support for editing deployed pipelines.

Improvements

CDAP-20381: Added the ability to configure Java options for a pipeline run by setting the system.program.jvm.opts runtime argument.

CDAP-20140: Replication pipelines generate logs for stats of events processed by source and target plugins at a fixed interval. 

Changes

CDAP-20430: Fixed the pipeline stage validation API to return unevaluated macro values to prevent secure macros from being returned.

CDAP-20373: When you duplicate a pipeline, CDAP appends _copy to the pipeline name when it opens in the Pipeline Studio. In previous releases, CDAP appended _<v1, v2, v3> to the name.

Bug Fixes

CDAP-20549: Fixed an issue where executor resource settings are not honored when app.pipeline.overwriteConfig is set.

CDAP-20458: Fixed an issue where the flow control running count metric (system.flowcontrol.running.count) might be stale if no new pipelines or replication jobs were started.

CDAP-20431: Fixed an issue that sometimes caused pipelines to fail when running pipelines on Dataproc with the following error: Unsupported program type: Spark. The first time a pipeline that only contained actions ran on a newly created or upgraded instance, it succeeded. However, the next pipeline runs, which included sources or sinks, might have failed with this error.

CDAP-20301: Fixed an issue where a replication job got stuck in an infinite retry when it failed to process a DDL operation. 

CDAP-20276: For replication jobs, fixed an issue where retries for transient errors from BigQuery might have resulted in data inconsistency.

CDAP-19389: For SQL Server replication sources, fixed an issue on the Review assessment page, where SQL Server DATETIME and DATETIME2 columns were shown as mapped to TIMESTAMP columns in BigQuery. This was a UI bug. The replication job mapped the data types to the BigQuery DATETIME type.

PLUGIN-1514: For the Database sink, fixed an issue where the pipeline didn’t fail if there was an error writing data to the database. Now, if there is an error writing data to the database, the pipeline fails and no data is written to the database.

PLUGIN-1513: For BigQuery Pushdown, fixed an issue when BigQuery Pushdown was enabled for an existing dataset, the Location where the BigQuery Sink executed jobs was the location specified in the Pushdown configuration, not the BigQuery Dataset location. The configured Location should have only been used when creating resources. Now, if the dataset already exists, the Location for the existing dataset is used.

PLUGIN-1512: Fixed an issue where pipelines failed when the output schema was overridden in certain source plugins. This was because the output schema didn’t match the order of the fields from the query. This happened when the pipeline included any of the following batch sources:

  • Database

  • Oracle

  • MySQL

  • SQL Server

  • PostgreSQL

  • DB2

  • MariaDB

  • Netezza

  • CloudSQL PostgreSQL

  • CloudSQL MySQL

  • Teradata

Pipelines no longer fail when you override the output schema in these source plugins. CDAP uses the name of the field to match the schema of the field in the result set and the field in the output schema.

PLUGIN-1503: Fixed an issue where pipelines that had a Database batch source and an Oracle sink that used a connection object (using SYSDBA) to connect to an Oracle database failed to establish a connection to the Oracle database. This was due to a package conflict between the Database batch source and the Oracle sink plugins.

PLUGIN-1494: For Oracle batch sources, fixed an issue that caused the pipeline to fail when there was a TIMESTAMP WITH LOCAL TIME ZONE column set to NULLABLE and the source had values that were NULL.

PLUGIN-1481: In the Oracle batch source, the Oracle NUMBER data type defined without precision and scale by default was mapped to CDAP string data type. If these fields were used by an Oracle Sink to insert into a NUMBER data type field in the Oracle table, the pipeline failed due to incompatibility between string and NUMBER type. Now, the Oracle Sink inserts these string types into NUMBER fields in the Oracle table.

Deprecations

Deprecated APIs (CDAP-20030) 

With the introduction of editing deployed pipelines, the behavior of some APIs have significantly changed. Due to these changes, the following APIs are no longer valid and are deprecated:

Deprecated API

Alternative API

POST /apps/{app-id}/versions/{version-id}/create

PUT /apps/{app-id}

DELETE /apps/{app-id}/versions/{version-id}

DELETE /apps/{app-id}

POST /apps/{app-id}/update

PUT /apps/{app-id}

GET /apps/{app-id}/versions/{version-id}/{program-type}/{program-id}/status

GET /apps/{app-id}/{program-type}/{program-id}/status

PUT apps/{app-id}/versions/{app-version}/restart-programs

POST /apps/{app-id}/{program-type}/{program-id}/{action}

OR

Create a new un-versioned restart API similar to apps/{app-id}/restart-programs

GET /apps/{app-name}/versions/{app-version}/{program-type}/{program-name}/runs/{run-id}

GET /apps/{app-name}/{program-type}/{program-name}/runs/{run-id}

GET - /apps/{app-name}/versions/{app-version}/{program-type}/{program-name}/runtimeargs

GET  /apps/{app-name}/{program-type}/{program-name}/runtimeargs

PUT - /apps/{app-name}/versions/{app-version}/{program-type}/{program-name}/runtimeargs

PUT  /apps/{app-name}/{program-type}/{program-name}/runtimeargs

GET - /apps/{app-name}/versions/{app-version}/{program-type}/{program-name}

GET /apps/{app-name}/{program-type}/{program-name}

GET - apps/{app-name}/versions/{app-version}/schedules/{schedule-name}

GET /apps/{app-name}/schedules/{schedule-name}

GET - apps/{app-name}/versions/{app-version}/schedule

GET  apps/{app-name}/schedules

GET - /apps/{app-name}/versions/{app-version}/{program-type}/{program-name}/schedules

GET /apps/{app-name}/{program-type}/{program-name}/schedules

PUT - apps/{app-name}/versions/{app-version}/schedules/{schedule-name}

PUT - apps/{app-name}/versions/{app-version}/schedules/{schedule-name}

POST - apps/{app-name}/versions/{app-version}/schedules/{schedule-name}/update

POST apps/{app-name}/schedules/{schedule-name}/update

DELETE - apps/{app-name}/versions/{app-version}/schedules/{schedule-name}

DELETE  apps/{app-name}/schedules/{schedule-name}

PUT - /apps/{app-name}/versions/{app-version}/{program-type}/{program-name}/runs/{run-id}/loglevels

PUT  /apps/{app-name}/{program-type}/{program-name}/runs/{run-id}/loglevels

POST - /apps/{app-name}/versions/{app-version}/{program-type}/{program-name}/runs/{run-id}/resetloglevels

POST  /apps/{app-name}/{program-type}/{program-name}/runs/{run-id}/resetloglevels

GET - /apps/{app-name}/versions/{app-version}/{service-type}/{program-name}/available

GET  /apps/{app-name}/{service-type}/{program-name}/available

Created in 2020 by Google Inc.