-
Notifications
You must be signed in to change notification settings - Fork 143
Description
Problem
OnDiskGraphIndex is immutable. Today, to delete nodes or perform compaction, clients must:
- Load each on-disk graph into an in-memory
OnHeapGraphIndex. - Apply deletions.
- Rewrite a new on-disk graph from the fully materialized heap index.
For large graphs (millions of nodes), this approach requires loading the full graph structure, which frequently leads to OOM and does not scale.
Proposal
Introduce an OnDiskGraphIndexCompactor that performs streaming N:1 compaction over multiple on-disk indexes without loading them into memory.
The compactor provides:
-
Delete Tracking
Track live/dead nodes using an in-memoryFixedBitSet, allowing deletion without rewriting the index or constructing anOnHeapGraphIndex. -
Streaming N:1 Compaction that performs streaming compaction
Write a newOnDiskGraphIndexvia a streaming writer (no full materialization of the graph). Because the existing writer does not support streaming individual nodes, this proposal includes implementing a newCompactorWriter. -
Custom Ordinal Remapping
Allow clients to pass a customOridnalMapperthat remaps ordinals from each source index to ordinals in the compacted output index.
Compactor Internal State
List<ImmutableGraphIndex> sources; // Source graph indexes to compact
Map<ImmutableGraphIndex, FixedBitSet> livenodes; // Tracks which ordinals are live
Map<ImmutableGraphIndex, OrdinalMapper> remappers; // Maps old ordinals -> new ordinalsAPI Usage
// Create a compactor from multiple immutable graph indexes
OnDiskGraphIndexCompactor compactor =
new OnDiskGraphIndexCompactor(List.of(source1, source2, source3));
// Mark which nodes are still live in each source
compactor.setLiveNodes(source1, liveNodes1); // FixedBitSet
compactor.setLiveNodes(source2, liveNodes2);
// Supply remapping functions for each source
compactor.setRemapper(source1, remapper1);
compactor.setRemapper(source2, remapper2);
// Execute streaming compaction
OnDiskGraphIndex result = compactor.compact(outputPath, numNeighbors);Implementation of Compaction
-
Traverse each node in each index
Iterate through all ordinals across all sources and skip deleted ones. -
Perform search across all sources
For each node, run a multi-source search to retrieve top-K candidates from all indexes. -
Apply pruning
Apply standard HNSW pruning rules to maintain neighbor limits and graph quality. -
Apply ordinal remapping
Convert both node ordinal and neighbor ordinals using the per-source remappers. -
Stream the node to I/O
Use the newCompactorWriterto write each node sequentially to disk.