Before writing any code, read these three documents — they are the source of truth:
- docs/HARNESS.md — 4-layer verification loop, expected outputs, Tcl harness pitfalls, command addition checklist
- docs/RUST_STYLE.md — Rust coding standards: error handling, file size limits, clone avoidance, iterator patterns, assertion requirements, checked arithmetic
- docs/DST_GUIDE.md — Deterministic simulation testing methodology: SimulatedRng, VirtualTime, buggify fault injection, shadow state comparison
Violations of RUST_STYLE.md or DST_GUIDE.md will be caught by code review and CI. Do not skip them.
Invoke these skills to inject domain expertise before working on specific areas:
| Skill | When to use |
|---|---|
/distributed-systems |
Before working on replication, gossip, CRDTs, or multi-node code |
/actor-model |
Before working on sharded_actor, connection handler, or message passing |
/dst |
Before writing or modifying DST tests, fault injection, or shadow state |
/tigerstyle |
Before modifying executor or data structure code (assertions, checked arithmetic) |
/rust-dev |
Before writing any Rust code in this project (style, conventions, patterns) |
/formal-verification |
Before working on TLA+ specs, Stateright models, Kani proofs, or Maelstrom |
Skills are defined in .claude/agents/ and are invocable via /skill-name in Claude Code.
For EVERY code change, follow this order:
Before writing any code:
- Identify which components are affected
- Check if new I/O abstractions are needed for testability
- Determine if actor boundaries need modification
- Document the approach in a brief plan
Every function/method must have:
debug_assert!for preconditions at entrydebug_assert!for postconditions/invariants after mutations- Checked arithmetic (use
checked_add,checked_sub, etc.) - Explicit error handling (no
.unwrap()in production paths) - Early returns for error cases
Every new component must be:
- Simulatable: All I/O through trait abstractions
- Deterministic: No hidden randomness, use
SimulatedRng - Time-controllable: Use
VirtualTime, notstd::time
- Unit tests for pure logic
- DST tests with fault injection for I/O components
- Multiple seeds (minimum 10) for simulation tests
- Redis equivalence tests for command behavior changes
- Use Zipfian distribution (not uniform) for realistic workload tests
- State invariants must be checkable at any point
- Add
verify_invariants()methods to stateful structs - Run invariant checks in debug builds after every mutation
Note: VOPR (Viewstamped Operation Replicator) is TigerBeetle's specific DST tool for testing their Viewstamped Replication consensus protocol — it is NOT a synonym for DST in general. Our DST methodology follows the FoundationDB approach (seed-based determinism, simulated I/O, fault injection). The verify_invariants() pattern comes from TigerStyle / Design-by-Contract, not VOPR.
This project follows Simulation-First Development inspired by FoundationDB and TigerBeetle. The core principle: if you can't simulate it, you can't test it properly.
All I/O operations must go through abstractions that can be:
- Simulated: Deterministic, controllable behavior
- Fault-injected: Network partitions, disk failures, message drops
- Time-controlled: Fast-forward time, test timeout scenarios
// GOOD: I/O through trait abstraction
trait ObjectStore: Send + Sync {
async fn put(&self, key: &str, data: &[u8]) -> Result<()>;
async fn get(&self, key: &str) -> Result<Vec<u8>>;
}
// BAD: Direct I/O that can't be simulated
std::fs::write(path, data)?;Components communicate via message passing, not shared mutable state:
// GOOD: Actor owns state exclusively
struct PersistenceActor {
state: PersistenceState, // Owned, not shared
rx: mpsc::Receiver<Message>,
}
// BAD: Shared state with mutex
struct SharedPersistence {
state: Arc<Mutex<PersistenceState>>, // Contention, hard to test
}Assertions (REQUIRED for every mutation):
// GOOD: Assert preconditions and postconditions
fn hincrby(&mut self, field: &str, increment: i64) -> Result<i64> {
debug_assert!(!field.is_empty(), "Precondition: field must not be empty");
let new_value = self.value.checked_add(increment)
.ok_or_else(|| Error::Overflow)?;
self.value = new_value;
debug_assert_eq!(self.get(field), Some(new_value),
"Postcondition: value must equal computed result");
Ok(new_value)
}
// BAD: No assertions, silent failures
fn hincrby(&mut self, field: &str, increment: i64) -> i64 {
self.value += increment; // Can overflow!
self.value
}Checked Arithmetic (REQUIRED):
- Use
checked_add,checked_sub,checked_mul,checked_div - Return explicit errors on overflow, never wrap silently
- Use
saturating_*only when saturation is the correct behavior
Control Flow:
- Prefer early returns, avoid deep nesting
- No hidden allocations - be explicit about
Vec::with_capacity - No panics in production paths, explicit
Result<T, E>
Naming:
- Functions that can fail: return
Result<T, E> - Functions that assert: prefix with
debug_assert! - Unsafe blocks: document why they're safe
Every stateful struct should be verifiable:
// GOOD: Struct with verification method
impl RedisSortedSet {
/// Verify all invariants hold - call in debug builds after mutations
fn verify_invariants(&self) {
debug_assert_eq!(
self.members.len(),
self.sorted_members.len(),
"Invariant: members and sorted_members must have same length"
);
debug_assert!(
self.is_sorted(),
"Invariant: sorted_members must be sorted by (score, member)"
);
for (member, _) in &self.sorted_members {
debug_assert!(
self.members.contains_key(member),
"Invariant: every sorted member must exist in members map"
);
}
}
pub fn add(&mut self, member: String, score: f64) {
// ... mutation logic ...
#[cfg(debug_assertions)]
self.verify_invariants();
}
}Design-by-Contract Checklist for New Structs:
- Define all invariants in comments
- Implement
verify_invariants()method - Call verification after every public mutation method
- Add
#[cfg(debug_assertions)]to avoid release overhead
Systems must remain stable under partial failures:
- Graceful degradation when dependencies fail
- Bounded queues with backpressure
- Timeouts on all external operations
Read docs/HARNESS.md first. It documents the full 4-layer verification loop, expected outputs, Tcl harness pitfalls, and the command addition checklist. Everything below is supplementary.
Priority Order (most to least important):
#[test]
fn test_with_fault_injection() {
for seed in 0..50 { // Minimum 10 seeds, prefer 50+
let mut harness = SimulationHarness::new(seed);
harness.enable_chaos(); // Random faults
// Run test scenario
harness.run_scenario(|ctx| async {
// Test logic here
});
harness.verify_invariants();
}
}DST tests MUST:
- Run with multiple seeds (minimum 10, ideally 50+)
- Include fault injection (network drops, delays, failures)
- Verify invariants after each operation
- Be deterministic given the same seed
- Test pure logic without I/O
- Use
InMemoryObjectStorefor storage tests - Include edge cases: empty inputs, max values, overflow
- Test component interactions
- Use real async runtime but simulated I/O
Run the official Redis test suite against our implementation:
./scripts/run-redis-compat.sh # default test files
./scripts/run-redis-compat.sh unit/type/string # specific test file
./scripts/run-redis-compat.sh unit/type/incr unit/expire # multiple filesIMPORTANT: Tcl Harness Pitfalls (read before modifying commands)
The Tcl harness crashes the entire test file on unimplemented commands. One ERR unknown command 'FOO' kills all remaining tests in that file. This means:
- Adding a stub that returns an error is better than not parsing at all. Parse unknown commands as
Command::Unknown(name)so they return a clean error instead of crashing the RESP parser. - Error message format matters. The Tcl tests use
assert_error "*pattern*"with glob matching. Redis uses these exact formats:ERR wrong number of arguments for 'xxx' commandERR value is not an integer or out of rangeERR value is not a valid floatERR syntax errorWRONGTYPE Operation against a key holding the wrong kind of value
- Double ERR prefix bug.
connection_optimized.rs::encode_error_intoprependsERRto messages. If your parser error string already starts withERR, the client seesERR ERR .... The function now checks for this, but keep it in mind. perf_config.tomlaffects tests. The root config hasnum_shards = 1which is required for Tcl tests (MULTI/EXEC, Lua scripts need all keys on one shard). Thedocker-benchmark/perf_config.tomlhasnum_shards = 16for throughput. Do not change the root config to multi-shard without understanding the consequences for transactions and Lua.max_sizeinperf_config.tomllimits request size. It was previously 1MB which causedstring.tclto crash on 4MB payloads. Now 512MB. If you see "buffer overflow" in server logs, check this value.- MULTI/EXEC state is at the connection level (
connection_optimized.rs), not per-shard executor. The executor still has transaction state for the simulation/DST path, but the production server intercepts MULTI/EXEC/DISCARD/WATCH before routing to shards. - WATCH uses GET-snapshot comparison. At WATCH time, the connection does
GET keyand stores the result. At EXEC time, it doesGET keyagain and compares. If different, EXEC returns nil array. This is correct but different from Redis's internal dirty-flag approach. - Shard-aggregated commands. DBSIZE, SCAN, KEYS, EXISTS, FLUSHDB/FLUSHALL, and DEL are handled specially in
sharded_actor.rsto fan out across all shards. If you add a new command that needs to see all keys, add aggregation there. - TIME command returns real wall-clock time via
SystemTime::now()at the sharded_actor level, not virtual time from the executor.
Current Tcl compatibility status:
| Suite | Pass | Blocker |
|---|---|---|
unit/type/incr |
28/28 | - |
unit/expire |
all | - |
unit/type/string |
72 pass, 0 err | LCS command not implemented (crashes test file) |
unit/multi |
20/56 | SWAPDB (database swapping not implemented) |
ACL (--features acl) |
536/536 lib tests | ACL DST + 21 unit tests pass; Tcl acl.tcl/acl-v2.tcl blocked on ACL LOG, DRYRUN |
To add a new command without breaking the harness:
- Add variant to
Commandenum incommand.rs - Update
get_primary_key(),get_keys(),name(),is_read_only()match arms - Add parsing in BOTH
parser.rsANDcommands.rs(zero-copy parser). Usemap_errto return Redis-compatible error strings, never raw Rust parse errors. - Add execution in
executor/mod.rsdispatch + the appropriate*_ops.rsfile - Add DST coverage in
executor_dst.rswith shadow state tracking (see/dstskill section 8) - If the command needs all-shard visibility, add aggregation in
sharded_actor.rs - If adding
keepttlor similar fields toCommand::Set, update ALL struct literal constructions (there are ~25+ across test files, script_ops.rs, etc.)
- Use Maelstrom for distributed correctness
- Test under network partitions
- Verify consistency guarantees
- Run:
./maelstrom/maelstrom test -w lin-kv --bin ./target/release/maelstrom_kv_replicated
Use Zipfian distribution for key access patterns in DST tests:
// GOOD: Realistic hot/cold key pattern
KeyDistribution::Zipfian { num_keys: 100, skew: 1.0 }
// BAD: Uniform distribution (unrealistic)
KeyDistribution::Uniform { num_keys: 100 }With skew=1.0, top 10 keys receive ~40% of accesses (like real workloads).
| File | Purpose |
|---|---|
src/io/mod.rs |
I/O abstractions (Clock, Network, RNG) |
src/simulator/ |
DST harness and fault injection |
src/simulator/dst_integration.rs |
DST with Zipfian distribution |
src/streaming/simulated_store.rs |
Fault-injectable object store |
src/buggify/ |
Probabilistic fault injection |
tests/redis-tests/ |
Official Redis Tcl test suite (git submodule) |
scripts/run-redis-compat.sh |
Wrapper to run Tcl tests against our server |
src/security/acl/ |
ACL system (users, commands, patterns, file) |
src/security/acl_dst.rs |
ACL DST harness with shadow state |
tests/acl_dst_test.rs |
ACL DST multi-seed integration tests |
src/streaming/wal.rs |
WAL entry format, writer, reader, rotator |
src/streaming/wal_store.rs |
WAL storage trait + local/simulated impls |
src/streaming/wal_actor.rs |
WAL group commit actor (turbopuffer pattern) |
src/streaming/wal_config.rs |
WAL configuration (fsync policies, rotation) |
src/streaming/wal_dst.rs |
WAL DST harness with crash/recovery testing |
tests/wal_dst_test.rs |
WAL DST multi-seed integration tests |
// 1. Define trait for the I/O operation
pub trait ObjectStore: Send + Sync + 'static {
fn put(&self, key: &str, data: &[u8]) -> impl Future<Output = IoResult<()>>;
}
// 2. Create production implementation
pub struct LocalFsObjectStore { path: PathBuf }
// 3. Create simulated implementation with fault injection
pub struct SimulatedObjectStore {
inner: InMemoryObjectStore,
fault_config: FaultConfig,
rng: SimulatedRng,
}
// 4. Use generic in component
pub struct StreamingPersistence<S: ObjectStore> {
store: S,
// ...
}enum Message {
DoWork(Work),
Shutdown { response: oneshot::Sender<()> },
}
async fn run(mut self) {
while let Some(msg) = self.rx.recv().await {
match msg {
Message::DoWork(w) => self.handle_work(w).await,
Message::Shutdown { response } => {
self.cleanup().await;
let _ = response.send(());
break;
}
}
}
}When you need to connect sync code (like command execution) to async actors:
// Use std::sync::mpsc for fire-and-forget from sync context
let (tx, rx) = std::sync::mpsc::channel();
// Bridge task drains sync channel into async actor
async fn bridge(rx: Receiver<Delta>, actor: ActorHandle) {
loop {
// Use recv_timeout to stay responsive to shutdown
if let Some(delta) = rx.recv_timeout(Duration::from_millis(50)) {
actor.send(delta);
}
if shutdown.load(Ordering::SeqCst) {
break;
}
}
}- Seed-based reproduction: All simulations use seeds. Save failing seeds to reproduce bugs.
- Trace logging: Use
tracingcrate with structured logging. - Invariant checks: Add assertions for state invariants after each operation.
- Batch operations: Accumulate deltas in write buffer before flushing
- Async I/O: Use
tokio::spawnfor background operations - Zero-copy where possible: Use
Bytesfor large data transfers - Profile before optimizing: Use
cargo flamegraph
All performance comparisons MUST use Docker containers to ensure fair, reproducible results.
| Concern | Docker Solution |
|---|---|
| Resource fairness | Both servers get identical CPU/memory limits |
| Environment consistency | Same OS, kernel, networking stack |
| Reproducibility | Anyone can run identical benchmarks |
| Apples-to-apples | Eliminates "my machine is faster" bias |
Location: docker-benchmark/
| Setting | Value | Rationale |
|---|---|---|
| CPU Limit | 2 cores | Realistic server constraint |
| Memory Limit | 1GB | Realistic server constraint |
| Network | Host networking | Eliminates Docker NAT overhead |
| Requests | 100,000 | Statistically significant |
| Clients | 50 concurrent | Realistic load |
| Pipeline depths | 1, 16, 64 | Tests different access patterns |
# REQUIRED for any performance claims
cd docker-benchmark
./run-benchmarks.sh
# This runs:
# 1. Official Redis 7.4 (port 6379)
# 2. Rust implementation (port 3000)
# 3. Both non-pipelined and pipelined tests# DON'T use these for Redis vs Rust comparisons:
cargo run --release --bin quick_benchmark # Local only, no Redis comparison
cargo run --release --bin benchmark # Local only, unfair conditionsThe local benchmarks (quick_benchmark, benchmark) are useful for:
- Quick smoke tests during development
- Profiling and optimization work
- Regression detection between commits
But they MUST NOT be used for Redis vs Rust performance claims in documentation.
When updating performance numbers:
- Always run Docker benchmarks - never use local numbers for comparisons
- Run multiple times - take median of 3 runs to reduce variance
- Document the environment - include Docker version, host OS
- Include all pipeline depths - P=1, P=16, P=64 tell different stories
# Example workflow for updating benchmarks
cd docker-benchmark
for i in 1 2 3; do
echo "=== Run $i ==="
./run-benchmarks.sh 2>&1 | tee run_$i.log
done
# Use median results in BENCHMARK_RESULTS.mdEvery commit must be fully tested and documented:
- Full Test Suite: Run
cargo test --release- all tests must pass - DST Tests: Run streaming and simulator DST tests with multiple seeds
- Local Smoke Test: Run
cargo run --release --bin quick_benchmarkfor quick validation - Docker Benchmarks: Run
docker-benchmark/run-benchmarks.shif updating performance claims - Documentation: Update
BENCHMARK_RESULTS.mdwith Docker benchmark results if metrics change
# 1. Run full test suite
cargo test --release
# 2. Run DST batch tests (if modifying streaming/persistence)
cargo test streaming_dst --release
# 3. Run Maelstrom linearizability tests (if modifying replication)
./maelstrom/maelstrom test -w lin-kv --bin ./target/release/maelstrom_kv_replicated \
--node-count 3 --time-limit 60 --rate 100
# 4. Quick local smoke test
cargo run --release --bin quick_benchmark
# 5. Docker benchmarks (REQUIRED for performance claims)
cd docker-benchmark && ./run-benchmarks.sh
# 6. Update BENCHMARK_RESULTS.md with Docker benchmark results
# 7. Commit with descriptive message
git commit -m "Description of changes
- What was added/changed
- Test results summary
- Docker benchmark comparison (if performance-related)"| Change Type | Required Updates |
|---|---|
| New feature | Tests, README feature list |
| Performance change | Docker benchmarks, BENCHMARK_RESULTS.md (Docker numbers only) |
| Bug fix | Regression test |
| Replication change | Maelstrom test run |
| Streaming change | DST tests with fault injection |
| Command behavior change | Redis equivalence test |
| Any Redis comparison | Docker benchmarks (NEVER local benchmarks) |
When updating BENCHMARK_RESULTS.md, include:
## [Date] - [Change Description]
### Test Configuration
- **Method**: Docker benchmarks (docker-benchmark/run-benchmarks.sh)
- **Docker**: [version]
- **Host OS**: [OS version]
- **CPU Limit**: 2 cores per container
- **Memory Limit**: 1GB per container
### Results (Docker - Fair Comparison)
| Operation | Redis 7.4 | Rust Implementation | Relative |
|-----------|-----------|---------------------|----------|
| SET (P=1) | X req/s | Y req/s | Z% |
| GET (P=1) | X req/s | Y req/s | Z% |
| SET (P=16)| X req/s | Y req/s | Z% |
| GET (P=16)| X req/s | Y req/s | Z% |Important: All Redis vs Rust comparison numbers MUST come from Docker benchmarks.
ADRs in docs/adr/ are living decision logs that track architectural decisions as they evolve. They are not static documents - update them as you work.
| ADR | Title | Summary |
|---|---|---|
| 001 | Simulation-First Development | FoundationDB/TigerBeetle-style DST |
| 002 | Actor-per-Shard Architecture | Lock-free message passing |
| 003 | TigerStyle Coding Standards | Safety-first engineering |
| 004 | Anna KVS CRDT Replication | Eventual/causal consistency |
| 005 | Streaming Persistence | Cloud-native S3 storage |
| 006 | Zero-Copy RESP Parser | High-performance parsing |
| 007 | Feature Flag Optimizations | Measurable performance tuning |
| 008 | Docker Benchmark Methodology | Fair performance comparisons |
| 009 | Security - TLS and ACL | Optional TLS and Redis 6.0+ ACL |
During development, add a Decision Log entry when you:
- Choose between alternative implementations
- Discover constraints that affect the architecture
- Defer or reject a planned feature
- Change approach based on learnings
- Make trade-offs during implementation
After commits, update Implementation Status when:
- New components are implemented
- Features are validated/tested
- Items move from "Not Yet Implemented" to "Implemented"
-
Add a Decision Log entry (most common):
| 2026-01-15 | Use batch inserts for persistence | Reduces write amplification |
-
Update Implementation Status tables as work progresses
-
Update main sections only if the decision fundamentally changes the approach
When making a decision that doesn't fit existing ADRs:
- Copy
docs/adr/000-template.mdtodocs/adr/NNN-descriptive-title.md - Fill in Context, Decision, Consequences
- Add initial Decision Log entry
- Update
docs/adr/README.mdindex
Core dependencies and their purposes:
tokio: Async runtimebincode: Binary serialization (3-5x smaller than JSON)crc32fast: Checksums for data integritytracing: Structured logging