Issues
- Allow Wrangler DataPrep UI to handle 14000+ buckets under one connectionCDAP-17655Albert Shau
- Replace Zookeeper for K8S CDAP setup with K8S secretsCDAP-17618Resolved issue: CDAP-17618Terence Yim
- Replication pipeline when deployed does not show advanced configurations in UI.CDAP-17450Resolved issue: CDAP-17450Edwin Elia
- Inconsistency while accepting the gcs bucket name.CDAP-17378Resolved issue: CDAP-17378Sagar Kapare
- HttpPollerSource is brokenCDAP-17372Resolved issue: CDAP-17372Yaojie Feng
- Emit table level metrics from replicator application.CDAP-17371Resolved issue: CDAP-17371Sagar Kapare
- MySQL replication pipeline fails when replicator is configured with multiple worker instances.CDAP-17370Resolved issue: CDAP-17370Sagar Kapare
- Classloading for Custom Authorization Extension Prematurely Deletes Classpath in StandaloneCDAP-17239Resolved issue: CDAP-17239Trishka Fernandes
- Honor the classloader set to the TwillPreparer to be used in the Dataproc job main containerCDAP-17228Resolved issue: CDAP-17228Terence Yim
- Wrong spark conf used for pipelinesCDAP-17213Resolved issue: CDAP-17213Terence Yim
- GCS Source: auto-add Path Field to output schemaCDAP-17210Trishka Fernandes
- GCS Source: incorrect error message, Path FieldCDAP-17209Trishka Fernandes
- Add the Runtime service to the System Admin -> Services listCDAP-17198Resolved issue: CDAP-17198Terence Yim
- Compress messages from runtime clientCDAP-17182Resolved issue: CDAP-17182Terence Yim
- Fix alignment of side by side tables and records in PreviewCDAP-17139Resolved issue: CDAP-17139Yuki Jung
- Prevent graphQL caching responsesCDAP-17132Resolved issue: CDAP-17132Anudeep Katragadda
- Update commons-collections library version from 3.2.1 to 3.2.2 or aboveCDAP-17083Trishka Fernandes
- Wrangler - GCS plugin add an empty row to dataCDAP-17029Resolved issue: CDAP-17029Albert Shau
- Wrangler - Calculate directive (set-column)CDAP-17027Resolved issue: CDAP-17027Albert Shau
- Pipeline workflow driver memory is not setting via engine configCDAP-17009Terence Yim
- Pipelines fail deployment complaining about different schemasCDAP-16994Resolved issue: CDAP-16994Albert Shau
- Wrangler: parse-as-csv usage text should spell out columnCDAP-16917Bhooshan Mogal
- Wrangler: DROP multiple columns fails with extra space between columnsCDAP-16916Chai Pydimukkala
- Wrangler: Cannot edit a step in a recipeCDAP-16915Chai Pydimukkala
- Wrangler: cleanse-column-names directive requires a trailing spaceCDAP-16913Bhooshan Mogal
- Cannot fetch preview data when the stage name contains spaceCDAP-16898Resolved issue: CDAP-16898Yuki Jung
- Reading from emtpy GCS bucket fails in preview mode.CDAP-16799Resolved issue: CDAP-16799Sagar Kapare
- Clicking on data pipeline tag leads to broken pageCDAP-16786Resolved issue: CDAP-16786Ajai Narayanan
- Wrangler does not convert string to integer when there are >30% null valuesCDAP-16780Mikkin Patel
- Joiner: ability to remove leading and trailing spaces in join keysCDAP-16761Amit Virmani
- Wrangler - I cannot parse the "body" column more than once.CDAP-16742Chai Pydimukkala
- spark streaming pipeline with secure key is failing on cloud environmentCDAP-16734Bhooshan Mogal
- Default dashboard without any pipeline runs shows all the bars on dashboardCDAP-16732Resolved issue: CDAP-16732Edwin Elia
- Set correct max heap size for pods when run program in k8sCDAP-16730Wangyuan Zhang
- Preview fails consistently with weird file not foundCDAP-16725Resolved issue: CDAP-16725Sagar Kapare
- Date picker styling is breaks for some date ranges in field level lineageCDAP-16720Ajai Narayanan
- Avro plugin doesn't support macrosCDAP-16666Prashant Jaikumar
- Remove (mostly) unused pipeline export modalCDAP-16658Trishka Fernandes
- Support authentication and TLS to Elastic metadata storageCDAP-16649Bhooshan Mogal
- Realtime pipeline fails to runCDAP-16612Resolved issue: CDAP-16612Vinisha Shah
- DataprocUtils.getBucketName nullpoint exceptionCDAP-16610Resolved issue: CDAP-16610Wangyuan Zhang
- Intermittent exception related to pollingCDAP-16603Resolved issue: CDAP-16603Yuki Jung
- DeltaPipelineTest is flakyCDAP-16596Resolved issue: CDAP-16596Albert Shau
- User can't change name of invalid headerCDAP-16594Bhooshan Mogal
- Material text field styling issueCDAP-16578Resolved issue: CDAP-16578Anudeep Katragadda
- Add macro support for format field in pluginsCDAP-16572Resolved issue: CDAP-16572Trishka Fernandes
- Replace angular-strap popover with React (mui) popoverCDAP-16560Ajai Narayanan
- BQ Missing Primary Key assessment table name and db name position need changeCDAP-16553Resolved issue: CDAP-16553Shifu Xu
- Engine config tooltip refers to Spark Streaming in batch pipeline configurationCDAP-16534Resolved issue: CDAP-16534Yuki Jung
- JS transform plugin fails when loading bytes fields from BQ sourceCDAP-16533Resolved issue: CDAP-16533Venudhar Ravishankar
50 of 93
User Story:
As a Wrangler user, I want to be able to easily access all of my customer buckets regardless of the total # of buckets so I can avoid logging into terminal, manually copying sensitive data into a local file, and uploading it into Wrangler.
Problem:
Today, the Wrangler “Connection” browser is limited to displaying 1000 buckets. This is problematic for our customer storage setup where we have at least 14000+ customer buckets, as we are limited by this arbitrary limit and can't access buckets that are outside of the sorted 1000 list of initial customer buckets. When trying to increase the display limit to ~10000, the browser freezes.
Our current workaround is to login to our cloud environment, head into the file for the header/first few rows, copy it into a local file, and upload it locally into wrangler. Very insecure and unideal.
There's an additional subtopic of the search being a suboptimal user experience, since it does not properly work unless you already scroll to the bucket in question (rendering search bar ineffective for customer storage setups with large numbers of buckets). In an ideal world where I have ~200? buckets, I'd much rather prefer to use search than try to scroll and visually try to find the right range of buckets.
Suggested Solutions?
Increase directory limit to 10000? If this problem can be solved by increasing the directory limit to 10000 without impacting performance, but I would prefer search to be indexed server-side and instant (described below)
Index buckets server-side and allow for searching of buckets to occur instantly (and not require the user to actually scroll to the bucket beforehand)
I know there's a pagination story https://cdap.atlassian.net/browse/CDAP-14446, but ultimately this is suboptimal for our use case because i don’t know what buckets are located on what pages, as our LiveRamp customer buckets are ordered in varying ids in various increments – this would be worse than scrolling IMO.
Acceptance criteria:
Upon using the Wrangler “Connection” browser for “liveramp_customer_home”, be able to access a bucket like lc-628861-xxxxxx easily without extra engineering intervention