persist: Add Criterion benchmarks for envelope encryption overhead by jasonhernandez · Pull Request #35338 · MaterializeInc/materialize

jasonhernandez · 2026-03-06T00:59:09Z

Summary

Adds Criterion micro-benchmarks for the hot-path AES-256-GCM crypto functions (encrypt_with_dek, decrypt_with_key, parse_envelope) across 4 payload sizes (256B, 4KiB, 64KiB, 1MiB)
Introduces a lib.rs to expose crypto internals to the benchmark harness, following the persist-client Criterion pattern
Confirms the <5µs claim for typical WAL batch sizes: 4KiB encrypt ~3.3µs, decrypt ~0.8µs, full roundtrip ~3.8µs

Benchmark Results

crypto/encrypt/256B     time: [2.92 µs]    thrpt: [83 MiB/s]
crypto/encrypt/4KiB     time: [3.33 µs]    thrpt: [1.14 GiB/s]
crypto/encrypt/64KiB    time: [14.5 µs]    thrpt: [4.2 GiB/s]
crypto/encrypt/1MiB     time: [143 µs]     thrpt: [6.8 GiB/s]

crypto/decrypt/256B     time: [226 ns]     thrpt: [1.1 GiB/s]
crypto/decrypt/4KiB     time: [813 ns]     thrpt: [4.7 GiB/s]
crypto/decrypt/64KiB    time: [11.9 µs]    thrpt: [5.1 GiB/s]
crypto/decrypt/1MiB     time: [144 µs]     thrpt: [6.8 GiB/s]

crypto/roundtrip/256B   time: [2.70 µs]    thrpt: [90 MiB/s]
crypto/roundtrip/4KiB   time: [3.83 µs]    thrpt: [1.0 GiB/s]
crypto/roundtrip/64KiB  time: [26.2 µs]    thrpt: [2.3 GiB/s]
crypto/roundtrip/1MiB   time: [325 µs]     thrpt: [3.0 GiB/s]

Test plan

cargo check -p mz-persist-consensus-svc --benches compiles
cargo test -p mz-persist-consensus-svc — all 34 tests pass
cargo bench -p mz-persist-consensus-svc — all 12 benchmarks produce results
Throughput numbers confirm <5µs for typical 4KiB WAL batches

🤖 Generated with Claude Code

Introduces two new components to batch independent cross-shard CAS writes into a single durable S3 Express One Zone PUT per flush interval, making cost O(1/batch_window) instead of O(shards): - `RpcConsensus` (gRPC client implementing the `Consensus` trait) - `persist-consensus-svc` (group commit service with actor-based state machine, WAL, snapshot, and recovery) The actor processes CAS, truncate, scan, head, and list_keys commands on a single thread. Writes are batched and flushed to S3 WAL periodically, with snapshots every N batches for bounded recovery. Includes 27 unit tests covering CAS semantics, group commit batching, read operations, truncate, WAL integration, and snapshot intervals.

- Switch S3 client to mz_aws_util for proper HTTP client and virtual-hosted-style addressing (fixes S3 Express compatibility) - Wrap runtime in LocalSet for spawn_local support - Add info-level request logging to gRPC handlers and flush - Replace explicit test shutdown calls with Drop-based cleanup - Change default flush interval to 20ms, listen port to 6890 - Fix pyactivate to pin Python 3.13 for confluent-kafka wheel compat

- Prometheus metrics for the consensus service: operation counters (CAS committed/rejected, head, scan, truncate), S3 write counters and latency histograms (power-of-2 buckets from 1ms to 5s), flush histograms (ops/batch, shards/batch, latency), in-memory state gauges (active shards, entries, bytes). Served via axum HTTP on port 6891. - Fix S3 retry logic: introduce WalWriteError enum that distinguishes Failed from AlreadyExists (412 PreconditionFailed / ConditionalRequestConflict). On retry, AlreadyExists means the original write landed — treat as success instead of error. - Grafana dashboard (auto-provisioned): 6 panels showing S3 PUTs/s (the money chart — stays flat), active shards, ops/s by type, S3 PUT latency p50/p99, ops per batch, and in-memory state. - Prometheus scrape config for the consensus service. - Demo harness: demo.sh creates a PG source with background inserts, then stages 5→20→50→100 materialized views to show shards scaling while S3 writes stay flat. cleanup.sh tears it all down.

…noise - Set active_shards/total_entries/approx_bytes gauges in Actor::new() so metrics reflect recovered state immediately instead of showing zeros until the first flush. - Rewrite demo.sh to be fully self-contained: auto-detects and starts Postgres (brew 14-18), creates database/table/publication, and cleans up on exit. - Downgrade per-request gRPC handler logging from info to debug to reduce noise under load.

- Split S3 latency into separate WAL (p50/p99/p99.99) and Snapshot panels; add S3 bytes written/s panel; anchor S3 Writes/s y-axis at 0 - Move Grafana from port 3000 to 3001 to free port for Console - Infinite retry with exponential backoff on WAL writes — only Ok and AlreadyExists are definite results, transient failures never propagate to clients - Add 10-second stats heartbeat log to actor for demo visibility - Demo: single-row UPDATE at 20Hz instead of INSERTs; trivial MVs (upper(val::text)); scale to 200 MVs; clean up stale PG state - Add one-pager and demo talk track documentation

Reword "The Insight" to "The Approach", remove mid-paragraph bolding, add in-memory serving note, add S3 prefix isolation, add "What We Did Not Build" section, add "Wait a second..." closing.

Add optional AES-256-GCM envelope encryption for all S3 WAL batches and snapshots. A KMS-derived Data Encryption Key (DEK) is cached in memory for fast local encryption; the KMS-wrapped copy is stored per-object so each object is self-contained for decryption. DEK rotates in the background on a configurable interval. Enabled via --kms-key-id; when unset, data passes through unencrypted (backward compatible). Decrypt uses a fast path when the wrapped DEK matches the cached key, avoiding KMS calls during normal operation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add micro-benchmarks for the hot-path AES-256-GCM crypto functions (encrypt_with_dek, decrypt_with_key, parse_envelope) to quantify encryption overhead and catch regressions. Benchmarks cover 256B, 4KiB, 64KiB, and 1MiB payloads — confirming <5µs for typical WAL batch sizes (4KiB encrypt ~3.3µs, roundtrip ~3.8µs). Introduces a lib.rs to expose crypto internals to the benchmark harness, following the persist-client Criterion pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T00:59:19Z

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

The PR title is descriptive and will make sense in the git log.
This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

pH14 and others added 11 commits March 5, 2026 08:01

persist: Add README for consensus service local demo

6425351

persist: Add default_timestamp_interval to README invocation

4b93e69

persist: Remove demo talk track from repo

33e3abd

persist: Polish one-pager with editorial feedback

ee6cb3a

Reword "The Insight" to "The Approach", remove mid-paragraph bolding, add in-memory serving note, add S3 prefix isolation, add "What We Did Not Build" section, add "Wait a second..." closing.

jasonhernandez closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persist: Add Criterion benchmarks for envelope encryption overhead#35338

persist: Add Criterion benchmarks for envelope encryption overhead#35338
jasonhernandez wants to merge 11 commits intomainfrom
kms-envelope-encryption-consensus-svc

jasonhernandez commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jasonhernandez commented Mar 6, 2026

Summary

Benchmark Results

Test plan

Uh oh!

github-actions bot commented Mar 6, 2026

PR title guidelines

Pre-merge checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants