Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Release Date: March 24, 2021

Note

Important: CDAP 6.4.0 is deprecated.

New Features

Datetime Data Type

...

Added new Wrangler directives that you can use in Power Mode to transform columns of strings to datetime values: Parse as Datetime, Current Datetime, Datetime to Timestamp, Format Datetime, Timestamp to Datetime. 

CDAP-17620: Added support Datetime logical data type in CDAP schema.

Dataproc

CDAP-17622: Added machine type, cluster properties, and idle TTL as configuration settings for the dataproc Dataproc provisioner. For more information, see Google Dataproc.

...

CDAP-17611: Updated Salesforce plugins to incorporate with the new OAuth macro function.

CDAP-17610: Implemented a new macro function for OAuth token exchange.

CDAP-17609: Implemented new HTTP endpoints for OAuth management.

Replication

CDAP-17674: Added support to allow users to specify a runtime argument, retain.staging.table, to retain BigQuery staging table to help debug issues.

CDAP-17595: Added upgrade support for replication jobs.

CDAP-17471: Added the ability to duplicate, export, and import replication jobs.

CDAP-17337: Added property to configure dataset name in the BigQuery replication target. By default, the dataset name is the same as the Replication source database name. For more information, see Google BigQuery Target.

...

CDAP-17618: Replaced Zookeeper for K8S CDAP setup with K8S secrets. For more information, see Prepare the secret token for authentication service.

CDAP-17466: Added Authentication functionality for CDAP on Kubernetes setup. For more information, see Installation on Kubernetes.

Joiner Analytics Plugin

CDAP-17607: Added advanced join conditions to the joiner plugin. This allows users to specify an arbitrary SQL condition to join on. These types of joins are typically much more costly to perform than basic join on equality. For more information, see Join Condition Type.

...

PLUGIN-558: Added new post-action plugin, GCS Done File Marker. This To help you orchestrate downstream/dependent processes, this post-action plugin marks the end of a pipeline run by creating and storing an empty DONE (or SUCCESS ) file in the given GCS bucket upon a pipeline completion, success, or failure so that you can use it to orchestrate downstream/dependent processes

Improvements

PLUGIN-601: Added a metric for bytes read from database source, which appears in the Spark UI.

PLUGIN-571: Added support to filter tables in the Multiple Database Tables Batch Source.

PLUGIN-570: Improved error handling for Multiple Database Batch Sources and BigQuery multi-table sink that enables the pipelines to continue if one or more tables fail.

CDAP-17724: Renamed replication pipelines to jobs.

CDAP-17721: Added support for Kerberos login in K8s environment.

CDAP-17675: Renamed Delete button to Remove in Replication Assessment report report. 

CDAP-17670:  Improved plugin initialization performance optimization.

CDAP-17650: Added tag with parent artifact detail to Dataproc cluster created by CDAP.

CDAP-17645: Set a timeout on the ssh connection so that the pipeline runs fails when the cluster becomes unreachable.

CDAP-17642: Added namespace count to Dataplane metrics.

CDAP-17621: Added the Customer Manager Encryption Key (CMEK) configuration property for replication BigQuery target. For more information, see Google BigQuery Replication Target.

CDAP-17613: Improved Replication Assessment page to highlight SQL Server tables with Schema issues in red.

CDAP-17603: Added ability to jump to any step when modifying the Replication draft.

CDAP-17601: Improved performance by loading data directly into the target table during replication snapshot process.

CDAP-17597: Added poll metrics in Overview and Monitoring in Replication detail view.

CDAP-17582: Added ability to pass additional properties for Debezium and jdbc drivers for replication sources.

CDAP-17482: Added ability to start Replication app from a last known checkpoint.

CDAP-17474: Added support for configuring elasticsearch TLS connection to trust all certs. For more information, see Elasticsearch.

CDAP-17414: Improved Replication Table selection user experience.

CDAP-17289: Improved reliability of Pub/Sub Source plugin.

CDAP-17248: Added File Encoding property (ISO-8859, Windows and EBCDIC) to Amazon S3, File and GCS File Reader batch source plugins.

CDAP-17114: Removed the record view in pipeline preview for the Joiner node because it was misleading.

CDAP-16548: Renamed the Staging Bucket Location property to Location in the BigQuery Target properties page. For more information, see Google BigQuery Target.

CDAP-16623: Removed multiple ways to collapse/expand the Connection menu.

CDAP-16008: Added support for running pipelines on Hadoop cluster with Kerberos enabled.

CDAP-15552: Fixed Wrangler to highlight new column generated by a directive.

Behavior Changes

CDAP-16180: Resolved macro to preferences during pipeline validation

...

PLUGIN-545: Added support for strings in Min/Max aggregate functions (used in both Group By and Pivot plugins).

PLUGIN-539: Fixed Salesforce plugin to correctly parse the schema as Avro schema to make sure all the field names are accepted by Avro.

PLUGIN-517: Fixed data pipeline with BigQuery sink that failed with INVALID_ARGUMENT exception if the range specified was a macro macro. 

PLUGIN-222: Fixed Kinesis Spark Streaming source, which had a class conflict, so users can now run pipelines with this source.

CDAP-17746: Fixed an issue in field validation logic in pipelines with BigQuery sink that caused a NullPointerException.

CDAP-17744: Fixed Schema editor to show UI validations.

CDAP-17737: Fixed Conditions plugins to work with Spark 3.

CDAP-17732: Fixed the Wrangler Generate UUID directive to correctly generate a universally unique identifier (UUID) of the record.

CDAP-17718: Fixed advanced joins to recognize auto broadcast setting.

CDAP-17717: Fixed upgraded CDAP instances to include arrow to the Error Collector Collector. 

CDAP-17713: Fixed Pipeline Studio UI to send null instead of string for blank plugin properties.

CDAP-17703: Fixed Pipeline Studio to use current namespace when it fetches data pipeline drafts.

CDAP-17691: Fixed SecureStore API to support SYSTEM namespace.

CDAP-17683: Fixed million indicator on Replication Monitoring page.

CDAP-17680: Fixed Replication statistics to display on the dashboard for SQL Server.

CDAP-17678: Fixed an issue where clicking the Delete button on Replication Assessment page resulted in an error for the replication job.

CDAP-17653: Removed the usage of authorization token while generating session token in nodejs proxy.

CDAP-17641: Schema name is now shown when selecting tables to replicate.

CDAP-17635: Fixed Replication to correctly insert rows that were previous deleted by a replication job.

CDAP-17630: Data pipelines running in Spark 3 enabled Dataproc cluster no longer fail with class not found exception  exception.  

CDAP-17617: Fixed Replication Overview page to display the label of the table status when you hover over the table status.

CDAP-17598: Added ability to hover over metrics in the Pipeline Summary page.

CDAP-17591: Fixed Wrangler completion percentage.

CDAP-17584: Fixed Replication with a SQL Server source to generate rows correctly in BigQuery target table if snapshot failed and restarted.

CDAP-17570: Fixed an issue where SQL Server replication job stopped processing data when the connection was reset by the SQL Server.

CDAP-17568: Fixed the Replication wizard to close without error when you click the X icon to exit.

CDAP-17495: Fixed an error in Replication wizard Step 3 "Select tables, columns and events to replicate" where selecting no columns for a table caused the wizard to fetch all columns in a table.

CDAP-17491: Using a macro for a password in a replication job no longer results in an error.

CDAP-17483: Fixed logical type display for data pipeline preview runs.

CDAP-17476: Fixed Dashboard API to return programs running but started before the startTime.

CDAP-17450: Fixed Replication job (when deployed) to show advanced configurations in UI.

CDAP-17347: Fixed data pipeline with Python Evaluator transformation to run without stack trace errors.

CDAP-17331: Suppressed verbose info logs from Debezium in Replication jobs.

CDAP-17189: Added loading indicator while fetching logs in Log Viewer.

CDAP-17028: Fixed Pipeline preview so logical start time function doesn’t display as a macro.

CDAP-16804: Fixed fields with a list drop down menu in the Replication wizard to default to “Select one”.

CDAP-16726: Added message in Replication Assessment when there are tables that CDAP cannot access.

CDAP-16609: Used error message when an invalid expression is added in Wrangler.

CDAP-16316: Fixed RENAME directive in Wrangler so it’s case sensitive.

CDAP-16233: Fixed Pipeline Operations UI to stop showing the loading icon forever when it gets error from backend.

CDAP-15979: Fixed Wrangler to no longer generate invalid reference names.

CDAP-15509: Fixed Wrangler to display logical types instead of java types.

CDAP-15465: Fixed pipelines from Wrangler to no longer generate incorrect for xml files.

CDAP-13907: Added connection in Wrangler hard codes the name of the jdbc driver.

CDAP-13281: Batch data pipelines with Spark 2.2 engine and HDFS sinks no longer fail with delegation token issue error.

Known Issues

BigQuery Sinks

...

In the Hub, download Google Cloud Platform version 0.17.1. For each pipeline, replace BigQuery sink plugins version 0.17.0 with BigQuery sink plugins version 0.17.1. If a pipeline has a BigQuery sink and other Google Cloud Platform plugins, such as a BigQuery source, you must update all Google Cloud Platform plugins to version 0.17.1. Google Cloud Platform plugins in the same pipeline must be the same version.
To quickly update each plugin, export all pipelines that use BigQuery sinks. You can use the Pipeline Studio to export pipelines in Draft and Deploy states. You can also use the Lifecycle Microservices to export pipelines in Deploy state in batch. Then import them back into Pipeline Studio. Pipeline Studio prompts you to update the plugins with version 0.17.1. Because CDAP exports pipelines to Draft state, you’ll need to deploy each pipeline after you import them.
Also, set version 0.17.1 as the default for all Google Cloud Platform plugins. For more information, see Working with multiple versions of the same plugin.

Info

Update: The Hub now has Google Cloud Platform version 0.17.2 available to deploy, which includes the changes made in 0.17.1 plus additional changes.

Joiner Analytics

PLUGIN-669: Joiner plugin version 2.6.0 does not show join conditions

...