GroupBy with CollectSet followed by a Python transform returns error
Description
Release Notes
Fixed an issue in the python transform that caused it to fail on certain types of array inputs.
Attachments
1
Activity
Show:

Albert Shau December 18, 2020 at 12:19 AM

Albert Shau December 17, 2020 at 10:23 PM

Albert Shau February 26, 2020 at 6:37 PM
It should not be assuming the underlying object is a List. An 'array' type can be a java array, or any java collection.
Fixed
Pinned fields
Click on the next to a field label to start pinning.
Created February 26, 2020 at 3:58 PM
Updated January 13, 2021 at 1:11 AM
Resolved December 18, 2020 at 12:25 AM
The steps to reproduce the issue:
Create a Source with a CSV file, for example:
The schema is name: string and value: string.
Add a GroupBy transform, grouping by name and a CollectSet of the value column.
Add an empty Python transform
Add a Sink (Trash sink for testing)
When the pipeline is run, It returns a cast exception
With CollectList it works correctly.
Attached a sample pipeline reading from GCS (real paths and project info removed)