CDAP Release 6.2.0

Important: CDAP 6.2.0 is deprecated.

Summary

This release introduces a number of new features, improvements, and bug fixes for CDAP. Some of the main highlights of the release are:

Replication

A CDAP application that you can use to easily replicate data at low-latency and in real-time from transactional and operational databases into analytical data warehouses.

Google Cloud Dataproc Runtime Improvement

The Google Cloud Dataproc runtime now uses native Dataproc API's for job submission instead of SSH.

Pipeline Studio Improvements

Added the ability to perform bulk operations (copy, delete) in the Pipeline Studio. Also added a right-click context menu for the Pipeline Studio.

New Features

CDAP-16385 - Added JDBC plugin selector widget.
CDAP-16339 - Introduced a new REST endpoint for fetching scheduled time for multiple programs.
CDAP-16243 - Added new capability to start system applications using application specific config during startup.
CDAP-16223 - Added Replication feature.
CDAP-16210 - Added support for connecting to multiple hubs through market.base.urls property in cdap-site.xml.
CDAP-16130 - Added the ability to right-click on the Pipeline Studio canvas to add a Wrangler source. This allows you to add multiple Wrangler sources (source + Wrangler transform) in the same pipeline without losing context.
CDAP-16107 - Added support for Spark 2.4.
CDAP-15941 - Added date picker widget to allow users to specify a single date or date range in a plugin.
CDAP-15633 - Added support to launch a job using Google Cloud Dataproc APIs.
CDAP-9014 - Added the ability to select multiple plugins and connections from Pipeline Studio copy or delete them in bulk.

Improvements

CDAP-16633 - Added option to generate scoped GoogleCredentials with Google BigQuery and Google Drive scope for all Google BigQuery requests.
CDAP-16572 - Added macro support for Format field in Google Cloud Storage plugin.
CDAP-16525 - Added an option for Database source to replace characters in the field names.
CDAP-16809 - Added support for copying header on compressed file.
CDAP-16656 - Added support for rendering large schemas (>1000 fields) in Pipeline UI by collapsing complex schemas and lazy-load fields in record types.
CDAP-16616 - Make the View Raw Logs and Download Logs buttons to be enabled all the time in the log viewer page.
CDAP-16593 - Added restrictions on the maximum number of network tags for Dataproc VM to be 64.
CDAP-16586 - Changed behavior for selecting multiple nodes in Studio to require the user to hold the key [shift] and click on the plugins (instead of holding [ctrl] and then click).
CDAP-16521 - Improved program startup performance by using a thread pool to start a program instead of starting from a single thread.
CDAP-16517 - Added an option to skip header in the files in delimited, csv, tsv, and text formats.
CDAP-16509 - Reduced memory footprint for StructureRecord which improves overall memory consumption for pipeline execution.
CDAP-16351 - Added an API that returns the names of input stages.
CDAP-16330 - Replaced config.getProperties with config.getRawProperties to make sure validation happens on raw value before macros are evaluated.
CDAP-16324 - Added macro support for Analytics plugins.
CDAP-16308 - Reduced preview startup by 60%. Also added limit to maximum concurrent preview runs (10 by default).
CDAP-16249 - Added ability to show dropped field operation from field level lineage page.
CDAP-16248 - For field level lineage, added ability for user to view all fields in a cause or impact dataset (not just the related fields).
CDAP-16211 - Unified JSON structure used by REST endpoints for fetching pipeline configuration and deploying pipelines.
CDAP-15894 - Added ability for user to navigate to non-target dataset by selecting the header of the dataset in field level lineage.
CDAP-15579 - Added the ability for SparkCompute and SparkSink to record field level lineage.
CDAP-15061 - Added a page level error when the user navigates to an invalid pipeline via the URL.
CDAP-13643 - Added support for recording field level lineage in streaming pipelines.

Bug Fixes

CDAP-16816 - Fixed schedule properties to overwrite preferences set on the application instead of the other way around. This most visibly fixed a bug where the compute profile set on a pipeline schedule or trigger would get overwritten by the profile for the pipeline.
CDAP-16751 - Fixed a bug where UI overwrites scale and precision properties in a schema with decimal logical type if the value is 0.
CDAP-16736 - Fixed record schema comparison to include record name.
CDAP-16725 - Fixed a bug where concurrent preview runs were failing because SparkConf for the new preview runs was getting populated with the configurations from the previously started in-progress preview run.
CDAP-16724 - Fixed a bug in Wrangler that would cause it to go out of memory when sampling a Google Cloud Storage object that has a lot of rows.
CDAP-16664 - Fixed a bug that resulted in failure to update/upsert to Google BigQuery in a different project.
CDAP-16663 - Fixed a bug where UI incorrectly showed "No schema available" when the output of the previous stage is a macro.
CDAP-16655 - Fixed a bug in File source that prevented reading files from Google Cloud Storage.
CDAP-16614 - Fixed the fetch run records API to honor the limit query parameter correctly.
CDAP-16581 - Fixed a bug that prevented a user from using parse-as-json directive in Wrangler.
CDAP-16538 - Fixed a bug in the PluginProperties class where internal map was modifiable.
CDAP-16526 - Fixed Google BigQuery sink to properly allow certain types as clustering fields.
CDAP-16501 - Fixed a bug to correctly update pipeline stage metrics in UI.
CDAP-16471 - Fixed a bug that would leave zombie processes when using the Remote Hadoop Provisioner.
CDAP-16465 - Fixed a bug where Wrangler database connections could show more tables than those in the configured database.
CDAP-16453 - Fixed a bug with LimitingInputFormat that made Database source plugin fail in preview mode.
CDAP-16425 - Fixed macro support for output schema in Google BigQuery source plugin.
CDAP-16309 - Fixed a race condition bug that can cause failure when running Spark program.
CDAP-16240 - Fixed a bug to show master and worker memory in Google Cloud Dataproc compute profiles in GB.
CDAP-16055 - Fixed a bug where the failure message emitted by Spark driver was not being collected.
CDAP-16002 - Fixed a bug that caused errors when Wrangler's parse-as-csv with header was used when reading multiple small files.
CDAP-15775 - Fixed a bug that disallowed writing to an empty Google BigQuery table without any data or schema.
CDAP-15649 - Fixed a bug that would cause the Google BigQuery sink to fail the pipeline run if there was no data to write.
CDAP-14850 - Fixed a bug in the custom date range picker that prevented users from setting a custom date range that is not in the current year.
CDAP-14190 - Fixed a bug where users cannot delete the entire column name in Wrangler.