Performance Evaluation
Performance Evaluation
Setup
Hardware used for measuring the performance:
2.9 GHz Intel Core i5
16 GB 2133 MHz LPDDR3
Java 7
Light Data Transformation DMD
These are the high-level transformations being performed on the data:
Parsing of CSV
Drop columns
Setting defaults on column
Changing case
Masking data
Filtering rows based on an expression
Directives
parse-as-csv demo , true
drop demo
drop demo_12
fill-null-or-empty demo_11 N/A
uppercase demo_17
mask-number demo_18 xxx###
drop demo_6
drop demo_7
fill-null-or-empty demo_5 N/A
uppercase demo_3
filter-row-if-true demo_9 =~ "CA"
mask-number demo_10 xxx##
mask-shuffle demo_4
Experiments
These two experiments were run: the first with 13M records, and the second with 80M records.
Experiment #1
Number of records: 13,499,973
Number of bytes: 4,499,534,313 (~ 4GB)
Number of columns: 18
Performance Numbers
count = 13,376,053
mean rate = 64998.50 records/second
1-minute rate = 64921.29 records/second
5-minute rate = 46866.70 records/second
15-minute rate = 36149.86 records/second
Experiment #2
Number of records: 80,999,838 (80M)
Number of bytes: 26,997,205,878 (~ 26GB)
Number of columns: 18
Total time: 1294 seconds (21.5 minutes)
Performance Numbers
count = 80,944,061
mean rate = 62465.93 records/second
1-minute rate = 62706.39 records/second
5-minute rate = 60755.41 records/second
15-minute rate = 56673.32 records/second
, multiple selections available,
Related content
Optimizing Joiner Performance
Optimizing Joiner Performance
More like this
Data Quality Application
Data Quality Application
More like this
Wrangler Service
Wrangler Service
Read with this
Data definition manipulation functions
Data definition manipulation functions
Read with this
Python transform improvements
Python transform improvements
More like this
Directive Execution
Directive Execution
Read with this
Created in 2020 by Google Inc.