CDAP Release 6.3.0

Important: CDAP 6.3.0 is deprecated.

Summary

This release introduces a number of new features, improvements, and bug fixes to CDAP. The main highlight of the release is:

Replication

  • Added metrics for amount of data processed and error count from replication pipelines.

  • Improved the Replication UI page for better user experience.

New Features

CDAP-16835 - Added support for upgrading applications via REST API. Example usage is to upgrade all pipelines in a namespace to use the latest available artifacts.

CDAP-16836 - Added new options in CDAP CLI to take URI instead of host and port combination.

CDAP-16980 - New Log Viewer feature which enables users to see the most recent logs.

CDAP-17355 - Added Draft count metric and created Drafts API to manage drafts in the backend.

CDAP-17418 - This feature supports replicating those databases that have a "schema" concept. While "schema" is just a collection of DB objects.

CDAP-17460 - Redesign Replication Detail page.

CDAP-17461 - Redesign Dashboard page into Operations page.

Improvements

CDAP-16812 - Updated labels and descriptions for Service Account properties in the Dataproc provisioner.

CDAP-16815 - Added a metric records.updated in BigQuery sink. This will give a total of all the inserts, updates, and upserts into the sink.

CDAP-16918 - Introduced a new REST API for getting all application details across all namespaces.

CDAP-16929 - Added the ability to select a Custom Dataproc Image. The complete URI for the custom image should be specified.

CDAP-17015 - Updated Preview to show the number of Preview runs pending before current run (if there are any runs pending). The number of pending runs is shown under the timer in the UI.

CDAP-17065 - Disabled Spark YARN app retries since Spark already performs retries at a task level.

CDAP-17077 - Changed the auto-caching strategy in Spark pipelines to default to using disk only caching instead of memory due to common out of memory failures. Also changed the caching strategy to only cache at places that would prevent sources from being recomputed instead of the more aggressive caching previously done.

CDAP-17078 - Added a setting to consolidate multiple pipeline branches into single operations in Spark pipelines. This can improve performance in pipelines by avoiding recomputation. This can be turned on by setting a preference or runtime argument for 'spark.cdap.pipeline.consolidate.stages' to 'true'.

CDAP-17095 - Added Distribution to AutoJoiner API to increase performance for skewed joins.

CDAP-17123 - Made "records.updated" metric available for GCS Batch Sink plugin.

CDAP-17130 - Added Joiner Distribution support to MapReduce and streaming pipelines.

CDAP-17179 - Added new properties `Filesystem properties` and `Output File Prefix` for GCS Sink.

CDAP-17182 - Enable traffic compression in Runtime service.

CDAP-17198 - Added Runtime service to the system service statues.

CDAP-17202 - Improved commit performance for sinks.

CDAP-17249 - Added documentation about Regex Path Filter property to File and GCS sources.

CDAP-17389 - Added options for master and worker disk type and fixed the Dataproc provisioner to use the configured disk settings for secondary workers on autoscale clusters.

CDAP-17425 - Exposed the number of preview records requested to source plugins.

CDAP-17428 - Changed pipeline stage consolidation to be enabled by default. This improves the performance of certain types of pipelines.

CDAP-17439 - Added support for Hadoop 3 and Spark 3 for program execution.

CDAP-17462 - Delta source developers don't need to populate previous rows in the update event if the delta source supports row_id which is a unique identifier that can identify a row.

CDAP-17484 - Replication Assessment page now displays an error when a user selects two source tables with the same name to replicate, which is not supported.

PLUGIN-282 - Added new Data Cacher plugin to allow users to manually cache data at certain points in a pipeline.

PLUGIN-303 - Added Distribution settings to Joiner plugin for increased performance in skewed joins.

Bug Fixes

CDAP-16797- CDAP UI now validates Pipeline Alerts before adding to the Pipeline Studio.

CDAP-16816 - Fixed schedule properties to overwrite preferences set on the application instead of the other way around. This most visibly fixed a bug where the compute profile set on a pipeline schedule or trigger would get overwritten by the profile for the pipeline.

CDAP-16824 - Fixed UI to show plugin properties for plugins that don't have a plugin widget. 

CDAP-16845 - Fixed a bug that started running Preview for pipelines with post-run actions even if the user chooses the option to not run Preview.

CDAP-16870 - Fixed PySpark support to work with Spark 2.1.3+.

CDAP-16879 - For BigQuery sinks, if both Truncate Table and Update Table Schema are set to True, when you run the pipeline, only Truncate Table will be applied. Update Table Schema will be ignored.

CDAP-16880 - Removed schema validation from BQ sink when 'Truncate Table' option is set to True.

CDAP-16891 - Unsupported pipelines in drafts would be upgraded when users open them.

CDAP-16898 - Fixed a bug that did not fetch Preview data when the plugin label had spaces in it.

CDAP-16950 - Includes all ERROR level logs logged under the application logging context.

CDAP-16959 - Fixed an issue in Preview with runtime arguments re-rendering and losing focus when containing macros.

CDAP-16972 - Fixed an issue where Preview config would open when trying to stop a Preview.

CDAP-16975 - If there are multiple versions of a plugin, the latest version is now the default and is the version that gets added to pipelines. If the user has already chosen a specific version (older version), it defaults to that instead of the latest.

CDAP-16976 - UI resets the default version of plugins for specific users during upgrade. When users upgrade from 6.1.2 to 6.1.3 or later, UI will reset the default version of the plugin the user has already chosen. Post upgrade, if the user uses the same plugin, UI will choose the latest version of the same plugin.

CDAP-16993 - Fixed a bug in Preview for fields that have non-string types such as bytes.

CDAP-17000 - Changed default value of spark.network.timeout to 10 minutes to make pipeline execution more stable for shuffle heavy pipelines.

CDAP-17029 - Fixed an issue that caused an extra empty row to appear when sampling GCS text files in Wrangler.

CDAP-17043 - Fixed a bug for showing dropdown menu for Wrangler tabs to be correct. Existing dropdown overlapped with other UI elements hindering the usage of UI.

CDAP-17044 - Columns names are validated for BigQuery sink.

CDAP-17045 - Fixed the bug to allow large pipelines with `-` in the name to properly overflow in the UI.

CDAP-17057 - Fixed a bug that did not allow a user to make further changes to preferences when saving preferences returned an error.

CDAP-17059 - Added a check to fail pipeline deployment if there is an action in the middle of the pipeline.

CDAP-17074 - Improved state transitions for starting pipelines in app fabric to increase stability if app fabric unexpectedly restarts.

CDAP-17097 - Fixed a bug that caused Splitter transforms to be unable to fetch their output ports and schemas.

CDAP-17117 - Fixed styling bug so header of Preview tab does not scroll with table.

CDAP-17121 - Fixed a bug where Preview run fails on null values due to Json Encorder NullPointerException.

CDAP-17133 - Fixed tab styles for users on Mac with system preferences set to show scrollbars always in Chrome.

CDAP-17135 - Fixed a race condition in stopping Spark program in Standalone CDAP that can cause stop to hang.

CDAP-17137 - Fixed a bug that showed preview pipeline stopping in UI even when call to stop pipeline returns error.

CDAP-17138 - Fixed a bug that caused an empty error banner to appear when the user stops Preview.

CDAP-17139 - Fixed styling of Preview tab so that side by side tables and record tables are aligned.

CDAP-17140 - Fixed a bug so error banner for deploy failure shows failure details from backend status message, if they exist.

CDAP-17141 - Fixed bug that allowed a user to make unsaved config changes by disabling Pipeline Config button in Preview mode when run is in progress.

CDAP-17145 - Modified preview timer logic to use submitTime instead of pipeline run startTime, to take into account time spent in INIT and WAITING states.

CDAP-17161 - Reduce memory footprint for program execution monitoring.

CDAP-17166 - Fixed a bug that caused the setting for the number of executors in streaming pipelines to be ignored.

CDAP-17171 - Fixed horizontal tab styling to handle mac system setting "scrolling always on" in chrome.

CDAP-17172 - Fixed a bug that showed banner about stopping pipeline when a pipeline was deployed after running Preview.

CDAP-17174 - Fixed a bug that doesn't allow the user to stop Preview if pipeline run has already completed.

CDAP-17213 - Pick up Spark configuration correctly from the remote Hadoop cluster for program execution.

CDAP-17217 - Fixed overflow styling for long text in preview tables.

CDAP-17224 - Fixed an issue where the Dashboard page will show the graph being full when there is no run during the time period selected.

CDAP-17225 - Fixed a bug that caused pipeline deployment to fail if the pipeline contained comments.

CDAP-17233 - Improved Wrangler error messages for incorrect syntax and errors in Wrangler command line.

CDAP-17237 - Fixed a bug where the cluster's default Hadoop settings were not being used in pipelines.

CDAP-17239 - Fixed a bug in StandaloneMain which prematurely deletes the Authorizer classpath directories.

CDAP-17243 - Hide Analytics and Rules Engine by default from UI.

CDAP-17246 - Fixed pipeline exported in CDAP 6.1.x to be imported without changing plugin names in the pipeline. This prevents pipelines failing during preview or deployment when imported from 6.1.x version of CDAP to 6.2.x+ version.

CDAP-17268 - Fixed a bug in schema editor to handle default type for keys (string) in map type if the existing schema doesn't have any type for keys.

CDAP-17323 - Fixed a bug in the Existing Dataproc provisioner that it checks for network unnecessarily.

CDAP-17379 - MySQL Sources for Replication now require MySQL JDBC driver 8 and above.

CDAP-17386 - Fixed a bug in Replication where MySQL source failed with NullPointerException when the data is null and a logical type.

CDAP-17408 - Fixed a bug that caused the number of partitions set by aggregator plugins to be ignored.

CDAP-17473 - Fixed a bug preventing macro values for Project ID and Path in GCS source plugin.

CDAP-17557 - Fixed the issue that when SQL Server Replicator restarts, it will generate a duplicate event.

PLUGIN-202 - Improved validations on GCS plugins to check for permissions on buckets, and improved error messages for users unable to access a GCS bucket.

PLUGIN-206 - BigQuery service API fixed a region error message discrepancy on their end.

PLUGIN-245 - Fixed BigQuery sink with macro table key validation.

PLUGIN-367 - Fixed a bug where blog file input formats are being split up in Hadoop jobs.

PLUGIN-370 - Improved some Cassandra validations.

PLUGIN-372 - Fixed user experience issue where Bigtable sink and source plugins may fail deployment if they are unable to connect to the Bigtable service.

PLUGIN-386 - Added support for BigQuery Views and Materialized Views to Wrangler.

PLUGIN-388 - Fixed output schema validation for GCS sinks with Format set to parquet.

Created in 2020 by Google Inc.