Issues

Select view

List view

Detail view

Select search mode

Basic

JQL

50 of 93

Allow Wrangler DataPrep UI to handle 14000+ buckets under one connection

Description

User Story:

As a Wrangler user, I want to be able to easily access all of my customer buckets regardless of the total # of buckets so I can avoid logging into terminal, manually copying sensitive data into a local file, and uploading it into Wrangler.

Problem:

Today, the Wrangler “Connection” browser is limited to displaying 1000 buckets. This is problematic for our customer storage setup where we have at least 14000+ customer buckets, as we are limited by this arbitrary limit and can't access buckets that are outside of the sorted 1000 list of initial customer buckets. When trying to increase the display limit to ~10000, the browser freezes.

Our current workaround is to login to our cloud environment, head into the file for the header/first few rows, copy it into a local file, and upload it locally into wrangler. Very insecure and unideal.

There's an additional subtopic of the search being a suboptimal user experience, since it does not properly work unless you already scroll to the bucket in question (rendering search bar ineffective for customer storage setups with large numbers of buckets). In an ideal world where I have ~200? buckets, I'd much rather prefer to use search than try to scroll and visually try to find the right range of buckets.

Suggested Solutions?

Increase directory limit to 10000? If this problem can be solved by increasing the directory limit to 10000 without impacting performance, but I would prefer search to be indexed server-side and instant (described below)
Index buckets server-side and allow for searching of buckets to occur instantly (and not require the user to actually scroll to the bucket beforehand)
I know there's a pagination story https://cdap.atlassian.net/browse/CDAP-14446, but ultimately this is suboptimal for our use case because i don’t know what buckets are located on what pages, as our LiveRamp customer buckets are ordered in varying ids in various increments – this would be worse than scrolling IMO.

Acceptance criteria:
Upon using the Wrangler “Connection” browser for “liveramp_customer_home”, be able to access a bucket like lc-628861-xxxxxx easily without extra engineering intervention

Release Notes

None

Linked issues

is duplicated by

CDAP-18669

Unable to browse more than 1000 GCS buckets through Wrangler UI

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

Albert Shau

Reporter

Kevin Wei(Deactivated)

Labels

Customer_requestWranglerscalability

UX Impact

Yes

Affects versions

6.2.0

Triaged

Yes

Components

Fix versions

Parking Lot

Priority

Major

Created February 2, 2021 at 5:21 AM

Updated February 11, 2022 at 11:09 PM

Configure

Activity

Show:

Vinisha Shah December 8, 2021 at 11:50 PM
Edited

If this behavior is not planned to be enhanced in near future, shall we document it as known issue so that customers dont get surprised?

Sree Raman February 11, 2021 at 4:37 PM

Adding the bucket to URL can work in the meantime

<INSTANCE_URL>/cdap/ns/default/connections/gcs/cloud_storage_default?prefix=/<BUCKET_NAME>

Terence Yim February 11, 2021 at 4:25 PM

In fact, having a box for the user to type in is more preferable than scrolling through thousands of buckets.

Terence Yim February 11, 2021 at 4:23 PM

As a quick fix, we can add a text box for the user to type in the bucket name if the user already know the name.