TL;DR - The total time the pipeline run vs the time-series graph we show for records in a specific stage and the ACTUAL time spent on the pipeline stage are different. UI doesn't show this distinction when inferring #of records processed.
Longer version -
The individual stage metrics that we show in UI doesn't show the full picture for users to infer the right information.
A simple pipeline with BQ source(with ~7M records) and BQ sink
Publish the pipeline & run
Once the pipeline run is complete open the metrics of the individual stages
In this specific example,
Post step 3 the UI shows a graph
X-axis indicates time (seconds): spans to 180 seconds
Y-axis indicates total # of records in (source).
From the graph the user is able to infer that the stage ran for 180 seconds and was able to process ~7M records
Below the graph we show a table that indicates metrics like process rate (# of records per second) and processing time
This shows ~298986 records processed per second.
When the user tries to correlate the data from the graph and the table the numbers don't add up. Here is why,
The graph shows the pipeline stage processed records for 180 seconds
The table below shows records processed per second is ~298986
According to this calculation if the stage processed ~298k records per second then for 180 seconds it should have processed 53,817,569 (~53M records).
We get the total time metrics from the backend (in this specific example as 26s) which we don't surface anywhere in the UI. If the user had the information that the total time spent on the specific stage then the rate of number of records processed would have been evident.