CDAP Release 6.2.2

Important: CDAP 6.2.2 is deprecated.

Summary

This release introduces a number of improvements and bug fixes to CDAP. Some of the main highlights of the release are:

  • Joiner plugin improvements. Added distribution support in the Joiner plugin to improve performance for skewed joins.

  • Wrangler improvements. Added support for BigQuery views and materialized views in Wrangler.

  • BigQuery Source plugin improvements. Added views and materialized views support to BigQuery source.

  • Preview Improvements. Improved the scale of the preview system when CDAP is run on k8s environment. The Preview UI tab is revamped with new a record view.

New Features

CDAP-16690 - Added revamped preview tab with new Record view for large schemas.

Improvements

  • CDAP-16668 - Added support for creating autoscale Dataproc clusters.

  • CDAP-16682 - When the system is experiencing slowness, users now see a message saying there's a delay.

  • CDAP-16712 - Improved the scalability of the preview system when running in Kubernetes environment by separating out preview runs in their own individual pods. Preview manager pod now only responsible for handling preview REST api.

  • CDAP-17015 - Updated Preview to show number of preview runs pending before current run (if there are any runs pending). The number of pending runs is shown under the timer in the UI.

  • CDAP-17077 - Changed the auto-caching strategy in Spark pipelines to default to using disk only caching instead of memory due to common out of memory failures. Also changed the caching strategy to only cache at places that would prevent sources from being recomputed instead of the more aggressive caching previously done.

  • CDAP-17078 - Added an experimental setting to consolidate multiple pipeline branches into single operations in Spark pipelines. This can improve performance in pipelines by avoiding recomputation. This can be turned on by setting a preference or runtime argument forspark.cdap.pipeline.consolidate.stages to true.

  • CDAP-17095 - Added Distribution to AutoJoiner API to increase performance for skewed joins.

  • CDAP-17123 - Make records.updated metric available for GCS Batch Sink plugin.

  • CDAP-17130 - Added joiner distribution support to MapReduce and streaming pipelines.

  • CDAP-17179 - Added new properties Filesystem properties and Output File Prefix for GCS Sink.

  • CDAP-17182 - Enable traffic compression in runtime service.

  • CDAP-17198 - Added Runtime service to the system service statues.

  • PLUGIN-303 - Added distribution settings to Joiner plugin for increased performance in skewed joins.

  • PLUGIN-386 - Added support for BigQuery Views and Materialized Views to Wrangler.

Bug Fixes

  • CDAP-12499 - Clarified error message for when branches of a conditional are used as inputs to the same node.

  • CDAP-15214 - Fixed bug that reset date range when navigating from dataset lineage to field level lineage.

  • CDAP-16732 - Fixed issue where Dashboard will show graphs when there is no run.

  • CDAP-16824 - Fixed UI to show plugin configuration for plugins that does not have a widget json support from the plugin artifact.

  • CDAP-16898 - Fixed bug that did not fetch Preview data when the plugin label had spaces in it.

  • CDAP-17043 - Fixed the bug for showing dropdown menu for Wrangler tabs to be correct. Existing dropdown overlapped with other UI elements hindering the usage of UI.

  • CDAP-17045 - Fixed the bug to allow large pipelines with - in the name to properly overflow in UI.

  • CDAP-17057 - Fixed bug that did not allow user to make further changes to preferences when saving preferences returned an error.

  • CDAP-17117- Fixed styling bug so header of preview tab does not scroll with table.

  • CDAP-17133 - Fixed tab styles for users on Mac with system preferences set to show scrollbars always in Chrome.

  • CDAP-17137 - Fixed bug that showed preview pipeline stopping in UI even when call to stop pipeline returns error.

  • CDAP-17138 - Fixed a bug that caused empty error banner to appear when user stops preview.

  • CDAP-17139 - Fixed styling of preview tab so that side by side tables and record tables are aligned.

  • CDAP-17140 - Fixed bug so error banner for deploy failure shows failure details from backend status message, if they exist.

  • CDAP-17141 - Fixed bug that allowed user to make unsaved config changes by disabling pipeline config button in Preview mode when run is in progress.

  • CDAP-17145 - Modified preview timer logic to use submitTime instead of pipeline run startTime, to take into account time spent in INIT and WAITING states.

  • CDAP-17161 - Reduced memory footprint for program execution monitoring.

  • CDAP-17166 - Fixed a bug that caused the setting for the number of executors in streaming pipelines to be ignored.

  • CDAP-17171 - Fixed horizontal tab styling to handle mac system setting "scrolling always on" in chrome.

  • CDAP-17172 - Fixed bug that showed banner about stopping pipeline when a pipeline was deployed after running preview.

  • CDAP-17174 - Fixed bug that doesn't allow user to stop preview if pipeline run has already completed.

  • CDAP-17213 - Pickup Spark configuration correctly from the remote Hadoop cluster for program execution.

  • CDAP-17217 - Fixed overflow styling for long text in preview tables.

  • CDAP-17224 - Fixed an issue where the Dashboard page will show the graph being full when there is no run during the time period selected.

  • CDAP-17225 - Fixed a bug that caused pipeline deployment to fail if the pipeline contained comments.

  • CDAP-17233 - Improved Wrangler error messages for incorrect syntax and errors in Wrangler command line.

  • CDAP-17237 - Fixed a bug where the cluster's default Hadoop settings were not being used in pipelines.

  • CDAP-17239 - Fixed bug in StandaloneMain which prematurely deletes the Authorizer classpath directories.

  • CDAP-17243 - Hide Analytics and Rules Engine by default from UI.

  • CDAP-17246 - Fixed pipeline exported in 6.1.x CDAP to be imported without changing plugin names in the pipeline. This prevents pipelines failing during preview or deployment when imported from 6.1.x version of CDAP to 6.2.x+ version.

  • PLUGIN-202 - Improved validations on GCS plugins to check for permissions on buckets, and improved error messages for users unable to access a GCS bucket.

  • PLUGIN-367 - Fixed bug where blog file input formats are being split up in Hadoop jobs.

  • PLUGIN-369 - Fixed a bug where customer credential information has shown up in the validation logs.

  • PLUGIN-372 - Fixed user experience issue where Bigtable sink and source plugins may fail deployment if they are unable to connect to the Bigtable service.

Created in 2020 by Google Inc.