The following benchmarks were calculated using the sales_sample_bigquery.json config file (100M rows to be generated) and using the default Dataflow machine type(n1-standard-1)
Runtime: 25min
Workers used: 72 (there was a resource quota limitation, otherwise more would have been used)
Create the BigQuery table in which we will store the generated data
bq mk \
-t \
--schema 'id:STRING,date:DATETIME,name:STRING,phone_number:STRING,product_id:STRING,country:STRING,amount:INTEGER,price:FLOAT,customer_satisfaction:FLOAT' \
--time_partitioning_field date \
--time_partitioning_type DAY \
bigdata_sample.data-generator-test1
| Field | Generation type | Runtime (ms) |
|---|---|---|
| id | UUID | 26ms |
| date | RANDOM_BETWEEN(2 dates) | 103ms |
| name | RANDOM_BETWEEN(string) | 24ms |
| phone_number | RANDOM_FROM_REGEX | 299ms |
| product_id | RANDOM_FROM_LIST | 16ms |
| country | RANDOM_FROM_LIST | 7ms |
| amount | RANDOM_BETWEEN(int) | 4ms |
| price | LOOKUP_VALUE | 3ms |
| customer_satisfaction | RANDOM_BETWEEN(float) | 12ms |


