Skip to content

[adapter] Batch-allocate user IDs to reduce persist writes during DDL#35310

Open
mtabebe wants to merge 1 commit intoMaterializeInc:mainfrom
mtabebe:ma/ddl/phase1-batch-ids
Open

[adapter] Batch-allocate user IDs to reduce persist writes during DDL#35310
mtabebe wants to merge 1 commit intoMaterializeInc:mainfrom
mtabebe:ma/ddl/phase1-batch-ids

Conversation

@mtabebe
Copy link
Contributor

@mtabebe mtabebe commented Mar 3, 2026

Problem:

Every DDL statement (CREATE TABLE, CREATE VIEW, etc.) performs a persist write + timestamp oracle call to allocate a single user ID. This represents one extra round trip to the timestamp oracle, in addition to the necessary catalog operation for DDL.

Solution:

  • Add IdPool struct with allocate(), allocate_many(), and refill() methods to pre-allocate user IDs in a contiguous range
  • Add USER_ID_POOL_BATCH_SIZE dyncfg (default 512) to control batch size
  • Add user_id_pool field to Coordinator initialized as empty
  • Add Coordinator wrapper methods allocate_user_id() and allocate_user_ids(count) that draw from the pool and auto-refill from catalog when exhausted
  • Replace all call sites that used the two-line pattern of get_catalog_write_ts() + catalog().allocate_user_id(ts) with the new pooled methods
  • This eliminates the timestamp oracle call, and catalog operation

Testing:

  • New IdPool unit tests covering empty pool, single/batch allocation, refill, mixed allocation, and invalid range panic
  • New SLT regression test (test/sqllogictest/id_pool.slt) exercising tables, views, materialized views, indexes, types, secrets, and drop/recreate cycles
  • New MZCompose restart test to ensure IDs remain unique across the restart

@mtabebe mtabebe requested a review from aljoscha March 3, 2026 16:15
@mtabebe mtabebe requested a review from a team as a code owner March 3, 2026 16:15
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@mtabebe mtabebe force-pushed the ma/ddl/phase1-batch-ids branch from a0cba20 to 563d8af Compare March 3, 2026 19:30
let (connection_id, connection_gid) = match self.allocate_user_id().await {
Ok(item_id) => item_id,
Err(err) => return ctx.retire(Err(err.into())),
Err(err) => return ctx.retire(Err(err)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got hit with a lint issue about this, so tidying it up

@mtabebe mtabebe force-pushed the ma/ddl/phase1-batch-ids branch 2 times, most recently from c3dab3b to 092af75 Compare March 3, 2026 20:34
@mtabebe mtabebe requested a review from a team as a code owner March 3, 2026 20:34
Problem:
Every DDL statement (CREATE TABLE, CREATE VIEW, etc.) performs a
persist write + timestamp oracle call to allocate a single user ID.
This represents one extra round trip to the timestamp oracle, in
addition to the necessary catalog operation for DDL.

Solution:
- Add `IdPool` struct with `allocate()`, `allocate_many()`, and
  `refill()` methods to pre-allocate user IDs in a contiguous range
- Add `USER_ID_POOL_BATCH_SIZE` dyncfg (default 512) to control
  batch size
- Add `user_id_pool` field to `Coordinator` initialized as empty
- Add Coordinator wrapper methods `allocate_user_id()` and
  `allocate_user_ids(count)` that draw from the pool and auto-refill
  from catalog when exhausted
- Replace all call sites that used the two-line pattern of
  `get_catalog_write_ts()` + `catalog().allocate_user_id(ts)` with
  the new pooled methods
- This eliminates the timestamp oracle call, and catalog operation

Testing:
- New `IdPool` unit tests covering empty pool, single/batch
  allocation, refill, mixed allocation, and invalid range panic
- New SLT regression test (`test/sqllogictest/id_pool.slt`)
  exercising tables, views, materialized views, indexes, types,
  secrets, and drop/recreate cycles
- New MZCompose restart test to ensure IDs remain unique across
  the restart
@mtabebe mtabebe force-pushed the ma/ddl/phase1-batch-ids branch from 092af75 to d3ee143 Compare March 3, 2026 20:43
@def-
Copy link
Contributor

def- commented Mar 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants