This note captures the next architectural step for the derived graph layer after the first real rebuild benchmarks on March 24, 2026.
The goal is not to replace the current model.
The goal is to make graph work more practical while preserving the existing rules:
- canonical history remains append-only source truth
- graph assertions remain derived
- full rebuild from canonical source remains possible
- scoped views and lenses remain first-class goals
Post-CPU-upgrade benchmark on the real local event store:
- canonical event files scanned:
17,820 - distinct graph assertions derived:
89,254 - derivation phase:
01:19:52 - full rebuild wall-clock:
01:20:31 - peak resident memory: about
454 MB
Earlier baseline before the VM CPU increase:
- canonical event files scanned:
17,302 - distinct graph assertions derived:
84,744 - full rebuild wall-clock:
01:29:26
Benchmark using the March 23, 2026 ChatGPT export against a temp copy of the committed pre-import event store:
- conversations seen:
88 - messages seen:
5,599 - artifact references seen:
71 - new canonical events appended:
515 - duplicates skipped:
5,246 - workflow-reported completion:
00:00:03 - end-to-end CLI wall-clock:
00:00:09.44
The important pattern is:
- canonical import is already incremental and relatively cheap
- full graph rebuild is still a full-source operation and is expensive
- RAM is not the bottleneck
- I/O is present but not obviously saturated
- CPU helps, but the current rebuild path does not scale linearly with more cores
This suggests that the next major gains will come from changing the graph materialization flow rather than adding more hardware.
Full graph rebuild from canonical source is a heavyweight maintenance operation.
It should remain available because:
- it is the correctness fallback
- it proves the graph can be reconstructed from canonical history
- it protects NEXUS from depending on an opaque mutable cache
But it should not be treated as the normal feedback loop for day-to-day work.
The graph side should be treated as three distinct layers:
This remains the durable source of truth:
- append-only canonical events
- import manifests
- projections derived from canonical history
This is the current graph/assertions/*.toml layer:
- rebuildable from canonical history
- durable enough to inspect, diff, and export
- still too expensive to regenerate casually at full scale
This layer should remain because it preserves a transparent derived history format.
This is the missing piece.
It should be a local materialized working substrate optimized for:
- interactive graph queries
- visualization preparation
- batch generation
- lens experimentation
- incremental updates
This layer is not source truth. It is a practical working index/cache/materialization over the durable derived graph.
The rebuild-graph-assertions command should remain a deliberate operation.
Likely future UX:
- clear warning/help text that it rewrites the full derived graph layer
- optional explicit confirmation or
--yesflag for large real stores - rebuild metrics captured in a small manifest/log
Most working updates should happen incrementally from the import path, not through full rebuild.
Practical units of work:
- import-scoped updates
- provider-scoped updates
- conversation-scoped updates
- artifact-scoped updates
- later lens-scoped or domain-scoped updates
Most human workflows do not need the whole graph at once.
The common working unit is more likely to be:
- a conversation cluster
- a concept cluster such as
FnHCI - a provider scope
- an import batch
- a domain or bounded-context scope
This aligns with the earlier Graphviz experiments: smaller scoped graphs are more useful than one giant picture.
No.
Most actual work should be chunked or sliced. The full graph is important for correctness checks, complete exports, and periodic rebuild validation, but it should not be the only mode.
Yes, eventually.
That becomes practical once derivation/materialization is explicit about shard boundaries such as:
- import
- provider
- conversation
- domain
At that point, jobs can be processed independently and merged by deterministic fact IDs.
This is a future direction, not a current requirement.
Probably not yet.
The current workload is dominated by:
- parsing
- string/key normalization
- dictionary/set work
- many small filesystem operations
That is not the kind of workload where a GPU is likely to be the first meaningful win.
The secondary working layer should be designed as an implementation detail behind a stable materialization boundary.
Candidate shapes:
- SQLite-backed working graph index
- DuckDB-backed analytic read model
- later specialized graph store if the need becomes real
Current recommendation:
- prefer a simple local embedded store first
- keep the durable TOML layers
- do not force a heavyweight external service too early
See also:
- Add explicit heavyweight-language and confirmation flow around full graph rebuilds.
- Record rebuild metrics in a small manifest/log so performance changes can be compared over time.
- Introduce a graph materializer abstraction separate from full TOML assertion rebuild.
- Implement a first local secondary working layer for incremental graph updates.
- Update the import path so new canonical events can feed incremental graph materialization without forcing a full rebuild.
- Keep Graphviz and later FnHCI visualization workflows focused on slices first, full graph second.
For now:
- keep canonical import incremental
- keep full graph rebuild available and truthful
- stop treating full rebuild as the default follow-up after every import
- build the secondary graph working layer next
That gives NEXUS a better practical flow without giving up correctness or rebuildability.