CDAP Release 6.2.1

Important: CDAP 6.2.1 is deprecated.

Summary

This release introduces a number of new features, improvements, and bug fixes to CDAP. Some of the main highlights of the release are:

  • Joiner Performance Improvements. Implemented performance improvements to joiner plugins. Joins can now also be performed in-memory if one side is small, and behavior on null keys can be chosen by the user.

  • Aggregator Plugin Performance Improvements. Improved aggregator performance for Spark engine.

New Features

  • CDAP-16708 - Added a new AutoJoiner API for plugins to implement. The new API leaves implementation details up to the application, which can perform join optimizations that were not possible with the older Joiner API.

  • CDAP-16530 - Fixed joiner output schema generation to be deterministic, using the same ordering as they had in the input data.

  • CDAP-16855 - Introduced a new aggregator API to achieve better performance when using Apache Spark engine.

  • CDAP-16918 - Introduced a new REST API for getting all application details across all namespaces.

Improvements

  • CDAP-16711 - Added the ability for Joiner plugins to specify whether null keys should match other null keys.

  • CDAP-16461 - Added Spark parameter to limit Spark block size to prevent issues with joins.

  • CDAP-16455 - Include logs emitted from the job main class as the Dataproc job logs.

  • CDAP-16682 - When backend is slow to respond to requests from UI, the UI now shows a delay notification.

  • CDAP-16690 - Revamped the preview tab with new Record view for large schemas.

  • CDAP-16835 - Added support for upgrading applications via REST API. Example usage is to upgrade all pipelines in a namespace to use latest available artifacts.

  • CDAP-16836 - Introduced a -l option in the CDAP CLI to take in URI in the new format http[s]://hostname:[port].

  • CDAP-16980 - Log viewer now allows users to see the most recent logs.

  • CDAP-16606 - Limit to reading in 100 records across all input partitions in preview.

  • CDAP-16621 - Removed modal showing pipeline JSON when users export pipelines. Instead, pipeline gets downloaded when users click "export pipeline" without the extra confirmation step.

  • CDAP-16673 - Added payload compression support to messaging service.

  • CDAP-16676 - Upgrade to use Dataproc API v1beta2 to allow endpoint config.

  • CDAP-16709 - Implemented performance improvements to joiner plugins to cap the required memory to around 4 GB per executor instead of scaling up as the skewness of the join goes up. Joins can now also be performed
    in-memory if one side is small, and behavior on null keys can be chosen by the user.

  • CDAP-16815 - Added a metric records.updated in BigQuery sink. This counts the total of all the inserts, updates and upserts into the sink.

  • CDAP-16929 - Added the ability to select a Custom Dataproc Image. The complete URI for the custom image should be specified.

  • CDAP-16975 - UI now adds the latest version of plugin, among the list of different versions of the
    plugin, when added from the side panel in the Pipeline Studio. If the user has already chosen a specific version (older version) it defaults to that instead of the latest.

  • CDAP-16976 - UI resets the default version of plugins for specific user during upgrade. When users
    upgrade from 6.1.2 to 6.1.3 or later UI will reset the default version of plugin the user has already chosen. Post upgrade if the user uses the same plugin UI will choose the latest version of the same plugin.

  • CDAP-17000 - Changed default value of spark.network.timeout to 10 minutes to make pipeline execution more stable for shuffle heavy pipelines.

Bug Fixes

  • CDAP-17003 - Removed redundant validations from BQ sink , this should reduce calls to BigQuery.getTable().

  • CDAP-16530 - Fixed joiner output schema generation to be deterministic, using the same ordering as they had in the input data.

  • CDAP-15869 - Fixed preview to display logical types as strings.

  • CDAP-16222 - Fixed the package references in the dynamic Spark plugin to use io.cdap instead of co.cask.

  • CDAP-16340 - Fixed the joiner plugin to allow a nullable key on one side and a non-nullable key on the other.

  • CDAP-16367 - Fixed a bug where field lineage is incorrect when a source is directly connected to a sink.

  • CDAP-16487 - Fixed regex for empty filter in the Wrangler UI.

  • CDAP-16731 - Fixed a bug that the GroupBy aggregator requires a different alias for the field name.

  • CDAP-16760 - Fixed a bug where memory, cpu, and engine config properties were not being set for Spark program plugins.

  • CDAP-16786 - Fixes listing pipelines by tags in Pipelines list page.

  • CDAP-16797 - CDAP UI now validates post run actions before adding to pipeline in the Pipeline Studio.

  • CDAP-16845 - Fixed a bug that started running preview for pipelines with post-run actions even if user chose option to not run preview.

  • CDAP-16870 - Fixed PySpark support to work with Spark 2.1.3+.

  • CDAP-16879 - 'Truncate table' and 'update schema' options if set together, will apply only WRITE_TRUNCATE to BQ job.

  • CDAP-16880 - Removed schema validation from BQ sink when 'truncate table' option is set.

  • CDAP-16891 - Unsupported pipelines in drafts would be upgraded when users open them.

  • CDAP-16927 - Added validation to ensure that account name ends with ".blob.core.windows.net" in the Azure Blob Store plugin.

  • CDAP-16950 - Includes all ERROR level logs logged under the application logging context.

  • CDAP-16959 - Fixed an issue with runtime arguments re-rendering and losing focus when containing macros in preview.

  • CDAP-16972 - Fixed an issue where preview config would open when trying to stop a preview.

  • CDAP-16993 - Fixed a bug in preview for fields that have non-string types such as bytes.

  • CDAP-17029 - Fixed an issue that caused an extra empty row to appear when sampling GCS text files in the Wrangler.

  • CDAP-17043 - Fixes the bug for showing dropdown menu for wrangler tabs to be correct. Existing dropdown overlapped with other UI elements hindering the usage of UI.

  • CDAP-17045 - Fixes a bug to allow large pipelines with - in the name to properly render in UI.

  • CDAP-17074 - Improved state transitions for starting pipelines in the App Fabric system service to
    increase stability if the service restarts unexpectedly.

  • CDAP-17097 - Fixed a bug that caused splitter transforms to be unable to fetch their output ports and schemas.

  • CDAP-17135 - Fixed a race condition in stopping Spark program in Standalone that can cause stop to hang.

Created in 2020 by Google Inc.