Skip to content

Latest commit

Β 

History

History
73 lines (50 loc) Β· 2.16 KB

File metadata and controls

73 lines (50 loc) Β· 2.16 KB

Technical Specification - Graph Rag

Version: v1.2.3 | Status: Active | Last Updated: March 2026

Module: codomyrmex.graph_rag
Last Updated: 2026-01-29

1. Purpose

Knowledge graph integration with RAG for structured knowledge retrieval and reasoning

2. Architecture

2.1 Components

graph_rag/
β”œβ”€β”€ __init__.py          # Module exports
β”œβ”€β”€ README.md            # Documentation
β”œβ”€β”€ AGENTS.md            # Agent guidelines
β”œβ”€β”€ SPEC.md              # This file
β”œβ”€β”€ PAI.md               # Personal AI context
└── core.py              # Core implementation

2.2 Dependencies

  • Python 3.10+
  • Parent module: codomyrmex

3. Interfaces

3.1 Public API

from codomyrmex.graph_rag import (
    KnowledgeGraph,    # Core knowledge graph with entities and relationships
    GraphRAGPipeline,  # End-to-end RAG pipeline with graph context
    Entity, EntityType,
    Relationship, RelationType,
    GraphContext,
)

3.2 Configuration

Environment variables:

  • CODOMYRMEX_*: Configuration options

4. Implementation Notes

4.1 Design Decisions

  1. In-memory graph: KnowledgeGraph stores entities and relationships in-memory dicts; suitable for session-scoped knowledge, not persistent storage.
  2. Pipeline pattern: GraphRAGPipeline wraps graph construction + retrieval into a single callable, keeping consumer code simple.

4.2 Limitations

  • Graph is not persisted across sessions (in-memory only); use agentic_memory for durable storage.
  • No graph database backend β€” does not support very large-scale graphs (>100K nodes).

5. Testing

# Run tests for this module
pytest tests/graph_rag/

6. Future Considerations

  • Graph database backend integration (Neo4j, NetworkX) to lift the current 100K-node ceiling on in-memory graphs.
  • Persistent graph storage across sessions: serialize the knowledge graph to disk and reload incrementally rather than rebuilding from source on every session start.
  • Multi-hop reasoning chains: expose a query interface that traces paths of configurable depth through the graph and returns confidence-weighted inference chains.