adapter/ddl-perf: eliminate O(n^2) operation replay in DDL transactions#35303
Draft
aljoscha wants to merge 4 commits intoMaterializeInc:mainfrom
Draft
adapter/ddl-perf: eliminate O(n^2) operation replay in DDL transactions#35303aljoscha wants to merge 4 commits intoMaterializeInc:mainfrom
aljoscha wants to merge 4 commits intoMaterializeInc:mainfrom
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
DDL transactions (e.g., BEGIN; CREATE TABLE FROM SOURCE t1 ...; CREATE TABLE FROM SOURCE t2 ...; COMMIT) replayed ALL previously executed operations for every new statement. For N statements, total work was 1+2+...+N = O(N^2). Session 23 measured per-table cost tripling from 104ms (batch=10) to 328ms (batch=500). Root cause: catalog_transact_with_ddl_transaction built all_ops = txn_ops + new_ops + TransactionDryRun and processed everything from scratch through the full catalog.transact() pipeline each time. Fix: instead of replaying all ops, process only the NEW ops against the accumulated CatalogState from the previous dry run. A new method transact_incremental_dry_run opens a fresh storage transaction, advances the OID allocator past previously-allocated OIDs, runs transact_inner with only the new ops against the accumulated state, and drops the transaction without committing. This reduces per-statement work from O(N) to O(1), making the overall transaction O(N) instead of O(N^2). ## Experiments Setup: Optimized build, PostgreSQL source with ~2500 upstream tables, ~6900 existing objects in Materialize. Each batch runs `CREATE TABLE ... FROM SOURCE` N times inside a single `BEGIN`/`COMMIT` DDL transaction. Results averaged over 2 repetitions per batch size. **Per-table cost (ms/table):** | batch_size | Baseline | With Fix | Speedup | |------------|----------|----------|---------| | 1 | 273 | 237 | 1.15x | | 5 | 161 | 121 | 1.33x | | 10 | 147 | 106 | 1.39x | | 25 | 146 | 98 | 1.49x | | 50 | 131 | 93 | 1.41x | | 100 | 154 | 86 | 1.79x | | 200 | 225 | 88 | 2.56x | | 300 | 296 | 90 | 3.29x | | 500 | 451 | 90 | 5.01x | **Total transaction time (ms):** | batch_size | Baseline | With Fix | Speedup | |------------|------------|-----------|---------| | 1 | 273 | 237 | 1.15x | | 5 | 811 | 606 | 1.34x | | 10 | 1,477 | 1,064 | 1.39x | | 25 | 3,674 | 2,463 | 1.49x | | 50 | 6,597 | 4,694 | 1.41x | | 100 | 15,515 | 8,647 | 1.79x | | 200 | 45,110 | 17,758 | 2.54x | | 300 | 89,073 | 27,408 | 3.25x | | 500 | 225,734 | 45,327 | 4.98x | Baseline per-table cost grows with batch size due to O(n²) replay (131ms at batch=50 → 451ms at batch=500, 3.4x increase). With the fix, per-table cost is constant at ~86–98ms regardless of batch size. At batch=500 this is a 5x total speedup (226s → 45s).
416bdac to
33c249e
Compare
144c60f to
4210d74
Compare
The incremental dry run created a fresh durable transaction for each statement but processed ops against an accumulated CatalogState from previous dry runs. For the 2nd+ statement, the tx (reflecting durable storage) and the state (reflecting accumulated changes) were out of sync, causing 'retraction does not match existing value' panics when applying diffs. Fix: after each dry run, export the transaction's current state as a Snapshot (Transaction::current_snapshot). On the next dry run, initialize the transaction from this saved snapshot via DurableCatalogState::transaction_from_snapshot, keeping tx and state in sync. This preserves the O(N)-per-statement optimization.
4210d74 to
28500ee
Compare
Contributor
Author
|
@def- could you maybe take a peek at the nightly there? Feels like they're all unrelated to this change but didn't want to just ignore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DDL transactions (e.g., BEGIN; CREATE TABLE FROM SOURCE t1 ...; CREATE
TABLE FROM SOURCE t2 ...; COMMIT) replayed ALL previously executed
operations for every new statement. For N statements, total work was
1+2+...+N = O(N^2).
Root cause: catalog_transact_with_ddl_transaction built
all_ops = txn_ops + new_ops + TransactionDryRun and processed everything
from scratch through the full catalog.transact() pipeline each time.
Fix: instead of replaying all ops, process only the NEW ops against the
accumulated CatalogState from the previous dry run. A new method
transact_incremental_dry_run opens a fresh storage transaction, advances
the OID allocator past previously-allocated OIDs, runs transact_inner
with only the new ops against the accumulated state, and drops the
transaction without committing. This reduces per-statement work from
O(N) to O(1), making the overall transaction O(N) instead of O(N^2).
Experiments
Setup: Optimized build, PostgreSQL source with ~2500 upstream tables,
~6900 existing objects in Materialize. Each batch runs
CREATE TABLE ... FROM SOURCEN times inside a singleBEGIN/COMMITDDL transaction. Results averaged over 2 repetitions per batch size.
Per-table cost (ms/table):
Total transaction time (ms):
Baseline per-table cost grows with batch size due to O(n²) replay
(131ms at batch=50 → 451ms at batch=500, 3.4x increase). With the
fix, per-table cost is constant at ~86–98ms regardless of batch size.
At batch=500 this is a 5x total speedup (226s → 45s).