CDAP Release 6.1.4

Important: CDAP 6.1.4 is deprecated.

Summary

This release provides performance and scalability improvements that increase developer productivity and optimize pipeline runtime performance. The release includes scaled-up previews that support up to 50 concurrent runs, capabilities to handle large and complex schemas in Pipeline Studio, an enhanced log viewer, and other critical improvements and fixes. Some of the highlights are:

Features

  • Added support to create autoscaling Dataproc clusters.

    • Added schema support feature in the UI to edit precision and scale.

    • Improved memory performance in pipelines by utilizing disk only auto-caching strategy.

Performance and Scalability Improvements

  • Supported 50 users running previews at the same time.

    • Supported large and deeply nested schemas (>5K fields with 20+ levels of nesting).

    • Added ability to optimize the performance of some pipelines with a new, experimental setting spark.cdap.pipeline.consolidate.stages.

New Features

  • CDAP-16980 - New Log Viewer feature which enables users to see the most recent logs.

  • CDAP-16836 - Added new options in CDAP CLI to take URI instead of host and port combination.

  • CDAP-16690 - Added revamped Preview tab with new Record view for large schemas.

Performance and Scalability Improvements

  • PLUGIN-282 - Added new Data Cacher plugin to allow users to manually cache data at certain points in a pipeline.

  • PLUGIN-174 - Enabled macro for Hostname, port and database name in database-specific plugins.

  • CDAP-17179 - Added new properties Filesystem properties and Output File Prefix for Google Cloud Storage Sink.

  • CDAP-17130 - Added Joiner distribution support to MapReduce and streaming pipelines.

  • CDAP-17123 - Make records.updated metric available for Google Cloud Storage Batch Sink plugin.

  • CDAP-17095 - Added Distribution to AutoJoiner API to increase performance for skewed joins.

  • CDAP-17078 - Added an experimental setting to consolidate multiple pipeline branches into single operations in Spark pipelines. This can improve performance in pipelines by avoiding recomputation. This can be turned on by setting a preference or runtime argument for spark.cdap.pipeline.consolidate.stages to true.

  • CDAP-17077 - Changed the auto-caching strategy in Spark pipelines to default to using disk only caching instead of memory due to common out of memory failures. Also changed the caching strategy to only cache at places that would prevent sources from being recomputed instead of the more aggressive caching previously done.

  • CDAP-16712 - Improved the scalability of the preview system when running in Kubernetes environment by separating out preview runs in their own individual pods. Preview manager pod now only responsible for handling preview REST API.

  • CDAP-16697 - Created Best Practices guide for Spark engine tuning.

  • CDAP-16682 - When the backend is slow to respond to requests from UI, we now show a snackbar saying there's a delay.

  • CDAP-16668 - Added support for creating autoscale Dataproc cluster.

  • CDAP-16850 - Introduced new schema editor for plugins in pipelines. The schema editor in addition to supporting large schemas (>5k fields) supports the ability to edit attributes for decimal types (precision & scale).

  • CDAP-17015 - Updated Preview to show number of preview runnings pending before current run (if there are any runs pending). The number of pending runs is shown under the timer in the UI.

Bug Fixes

  • PLUGIN-372 - Fixed user experience issue where Google Cloud Bigtable sink and source plugins may fail deployment if they are unable to connect to the Google Cloud Bigtable service.

  • PLUGIN-369 - Fixed a bug where customer credential information has shown up in the validation logs.

  • PLUGIN-367 - Fixed bug where blog file input formats are being split up in Hadoop jobs.

  • PLUGIN-245 - Fixed Google Cloud BigQuery sink with macro table key validation.

  • PLUGIN-206 - Fixed a region error message discrepancy of Google Cloud BigQuery service API on their end.

  • PLUGIN-202 - Improved validations on Google Cloud Storage plugins to check for permissions on buckets, and improved error messages for users unable to access a Google Cloud Storage bucket.

  • CDAP-17171 - Fixed horizontal tab styling to handle mac system setting "scrolling always on" in chrome.

  • CDAP-17166 - Fixed a bug that caused the setting for the number of executors in streaming pipelines to be ignored.

  • CDAP-17161 - Reduced memory footprint for program execution monitoring.

  • CDAP-17154 - Fixed a race condition that caused runtime monitoring not working properly when there are concurrent launching of programs, which result in program state not able to transit and missing metadata.

  • CDAP-17153 - Modified Preview tab so that multiple input or outputs are shown with tabs in table mode.

  • CDAP-17141 - Fixed bug that allowed user to make unsaved config changes by disabling pipeline config button in Preview mode when run is in progress.

  • CDAP-17140 - Fixed bug so error banner for deploy failure shows failure details from backend status message, if they exist.

  • CDAP-17139 - Fixed styling of Preview tab so that side by side tables and record tables are aligned.

  • CDAP-17135 - Fixed a race condition in stopping Spark program in Standalone that can cause stop to hang.

  • CDAP-17133 - Fixed tab styles for users on Mac with system preferences set to show scrollbars always in Chrome.

  • CDAP-17117 - Fixed styling bug so header of preview tab does not scroll with table.

  • CDAP-17097 - Fixed a bug that caused Splitter transforms to be unable to fetch their output ports and schemas.

  • CDAP-17074 - Improved state transitions for starting pipelines in app fabric to increase stability if app fabric unexpectedly restarts.

  • CDAP-17057 - Fixed bug that did not allow users to make further changes to preferences when saving preferences returned an error.

  • CDAP-17045 - Fixed the bug to allow large pipelines with - in the name to properly overflow in the UI.

  • CDAP-17044 - Validated Columns names for BigQuery sink.

  • CDAP-17043 - Fixed the bug for showing dropdown menu for wrangler tabs to be correct. Existing dropdown overlapped with other UI elements hindering the usage of UI.

  • CDAP-16930 - Missing plugins in a pipeline would have properties button disabled with a tooltip.

  • CDAP-16754 - Preview shows logical types in ISO format.

  • CDAP-16747 - Modified loading screen for Preview tab.

  • CDAP-16414 - GraphQL errors now use standard page level error or error banner based on severity to display the errors.

  • CDAP-15869 - Preview displays logical types as strings.

  • CDAP-12499 - Clarified error message for when branches of a conditional are used as inputs to the same node.

Created in 2020 by Google Inc.