We're updating the issue view to help you get more done. 

Total time to process records in a stage is misleading in UI

Description

TL;DR - The total time the pipeline run vs the time-series graph we show for records in a specific stage and the ACTUAL time spent on the pipeline stage are different. UI doesn't show this distinction when inferring #of records processed.

Longer version -

The individual stage metrics that we show in UI doesn't show the full picture for users to infer the right information.

Scenario

  1. A simple pipeline with BQ source(with ~7M records) and BQ sink 

  2. Publish the pipeline & run

  3. Once the pipeline run is complete open the metrics of the individual stages

In this specific example,

Post step 3 the UI shows a graph

  • X-axis indicates time (seconds): spans to 180 seconds

  • Y-axis indicates total # of records in (source).

From the graph the user is able to infer that the stage ran for 180 seconds and was able to process ~7M records

Below the graph we show a table that indicates metrics like process rate (# of records per second) and processing time

This shows ~298986 records processed per second.

When the user tries to correlate the data from the graph and the table the numbers don't add up. Here is why,

  • The graph shows the pipeline stage processed records for 180 seconds

  • The table below shows records processed per second is ~298986

  • According to this calculation if the stage processed ~298k records per second then for 180 seconds it should have processed 53,817,569 (~53M records).

We get the total time metrics from the backend (in this specific example as 26s) which we don't surface anywhere in the UI. If the user had the information that the total time spent on the specific stage then the rate of number of records processed would have been evident.

 

 

Release Notes

None

Assignee

Trishka Fernandes

Reporter

Bhooshan Mogal

Labels

None

Docs Impact

None

UX Impact

None

Components

Priority

Major
Configure