BigQuery performance improvements and bugfixes#3782
Conversation
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
| return fmt.Errorf("failed to get downscoped token for bucket %s with prefix %s: %w", bucketName, prefix, err) | ||
| } | ||
|
|
||
| urlHeaders.Set("Authorization", "Bearer "+token.Value) |
There was a problem hiding this comment.
The header situation is a bit mind-bending. They're queued along with each object, but are actually a reference to the same map, and then object sync assumes they're the same within batch? Wonder if there's a way to have one object (per snapshot) that everyone calls and it's always up to date
There was a problem hiding this comment.
Like a callback function to retrieve headers? I think it could make sense.
The idea behind this to be unique per object was to keep object sync mechanism generic.
I would leave it as a potentical refactor item.
There was a problem hiding this comment.
We only have one implementation so it's ok to change expectations to fit BigQuery. If that changes for another object source, we can revisit
good bot claude https://github.com/PeerDB-io/peerdb/compare/customer-kuba-bq-improvements...header-provider?expand=1
There was a problem hiding this comment.
Let's follow up in seperate PR if we want to.
|
Can the row counting change be captured in PR description? It's a big departure from the interface that may show up in monitoring/stats |
|
Also looks like the test failure was a flake but it's a BQ source flake and has been around for a while https://github.com/PeerDB-io/peerdb/actions/runs/20774322818/job/59656549068?pr=3782 |
ilidemi
left a comment
There was a problem hiding this comment.
Looking good, only a few comments
| return fmt.Errorf("failed to get downscoped token for bucket %s with prefix %s: %w", bucketName, prefix, err) | ||
| } | ||
|
|
||
| urlHeaders.Set("Authorization", "Bearer "+token.Value) |
There was a problem hiding this comment.
We only have one implementation so it's ok to change expectations to fit BigQuery. If that changes for another object source, we can revisit
good bot claude https://github.com/PeerDB-io/peerdb/compare/customer-kuba-bq-improvements...header-provider?expand=1
| if i > 0 { | ||
| urlList.WriteString(",") | ||
| } | ||
| urlList.WriteString(obj.URL) |
There was a problem hiding this comment.
Add a defensive check that URL doesn't contain ,{}? Based on the current implementation it should be ok (bucket name restrictions + url.PathEscape) but that needed tracking down
Two major improvements:
EXPORT DATAquery statement is built directly, instead of the SDK API.totalRowsare not return in BigQuery source connector as it's non-accurrate informaton. Synced total rows should be taken into account.Plus several bugfixes, mostly for handling data types like Array of Record.