An agentic FUSE filesystem that makes file management safe and structured for LLM agents. Includes JSON-based operations with undo support, complete audit logging, and AI-powered features like semantic search, auto-organization, and deduplication.
- Agent File Operations - Structured file ops with JSON feedback via
.ops/interface - Safety Layer - Soft delete, audit logging, and undo support via
.safety/ - AI-Powered Management - Auto-organization, deduplication, and cleanup via
.semantic/ - Semantic Search - Query files by meaning using vector similarity search
- Local Embeddings - Runs entirely offline using the
gte-smallmodel via Candle - FUSE Integration - Mount indexed directories as a virtual filesystem
- Real-time Indexing - Watch directories for changes and update the index automatically
- Multimodal Support - Extract content from text, code, markdown, PDF, and images
- Code-aware Chunking - Syntax-aware splitting using tree-sitter for source code
- Hybrid Search - Combine vector similarity with full-text search
- MCP Server - Claude Desktop integration for AI assistants
- Comprehensive Testing - 270+ tests across all crates ensuring reliability
| Feature | Status | Notes |
|---|---|---|
| CLI (index, query, status) | Stable | Core functionality |
| FUSE mount | Stable | Linux only |
| Semantic search | Stable | Vector similarity with LanceDB |
| Hybrid search | Stable | Vector + full-text |
| Text extraction | Stable | 40+ formats |
| Code chunking | Stable | Tree-sitter based |
| PDF extraction | Stable | Text + embedded images |
| Agent operations (.ops/) | Stable | JSON feedback, batch support |
| Safety layer (.safety/) | Stable | Trash, history, undo |
| Semantic operations (.semantic/) | Beta | Organize, dedupe, cleanup |
| Python bindings | Beta | PyO3 based |
| MCP server | Beta | Claude Desktop integration |
| Image captioning | Experimental | Optional, requires vision feature |
Ideal for:
- LLM agents managing files (Claude, GPT, local models)
- Automated file organization and cleanup
- Safe file operations with audit trail
- Code repositories (1K-50K files)
- Documentation collections
- Research notes and papers
- Local-first semantic search
Limitations:
- Linux only (FUSE requirement)
- Embedding model requires ~500MB disk
- Large repositories (100K+ files) may need tuning
- Rust 1.88 or later
- Linux with FUSE support (
libfuse-devon Debian/Ubuntu,fuseon Arch) - ~500MB disk space for the embedding model (downloaded on first run)
# Clone the repository
git clone https://github.com/Venere-Labs/ragfs.git
cd ragfs
# Build in release mode
cargo build --release
# Install to ~/.cargo/bin
cargo install --path crates/ragfs# Index all files in a directory
ragfs index ~/Documents
# Watch for changes (continuous indexing)
ragfs index ~/Documents --watch# Semantic search
ragfs query ~/Documents "machine learning implementation"
# Get more results
ragfs query ~/Documents "authentication logic" --limit 20
# JSON output for scripting
ragfs query ~/Documents "database connection" --format json# Create a mount point
mkdir ~/ragfs-mount
# Mount the indexed directory
ragfs mount ~/Documents ~/ragfs-mount --foregroundragfs status ~/Documents# Create a file with feedback
echo -e "docs/new.md\n# New Document" > ~/ragfs-mount/.ragfs/.ops/.create
cat ~/ragfs-mount/.ragfs/.ops/.result # JSON with undo_id
# Delete a file (soft delete to trash)
echo "docs/old.md" > ~/ragfs-mount/.ragfs/.ops/.delete
# Find similar files
echo "src/main.rs" > ~/ragfs-mount/.ragfs/.semantic/.similar
cat ~/ragfs-mount/.ragfs/.semantic/.similar
# Undo an operation
echo "<undo_id>" > ~/ragfs-mount/.ragfs/.safety/.undoragfs [OPTIONS] <COMMAND>
Commands:
mount Mount a directory as a RAGFS filesystem
index Index a directory (without mounting)
query Query the index
status Show index status
config Manage configuration
Options:
-c, --config <FILE> Config file path [default: ~/.config/ragfs/config.toml]
-v, --verbose Enable verbose logging
-f, --format <FORMAT> Output format: text, json [default: text]
-h, --help Print help
-V, --version Print version
ragfs mount <SOURCE> <MOUNTPOINT> [OPTIONS]
Arguments:
<SOURCE> Source directory to index
<MOUNTPOINT> Mount point
Options:
-f, --foreground Run in foreground (don't daemonize)
--allow-other Allow other users to access the mount
ragfs index <PATH> [OPTIONS]
Arguments:
<PATH> Directory to index
Options:
-f, --force Force reindexing of all files
-w, --watch Watch for changes after initial indexing
ragfs query <PATH> <QUERY> [OPTIONS]
Arguments:
<PATH> Path to indexed directory
<QUERY> Query string
Options:
-l, --limit <LIMIT> Maximum results [default: 10]
ragfs status <PATH>
Arguments:
<PATH> Path to indexed directory
ragfs config <ACTION>
Actions:
show Display current configuration
init Print sample config file
path Print config file path
RAGFS is organized as a Rust workspace with specialized crates:
| Crate | Description |
|---|---|
ragfs |
CLI application |
ragfs-core |
Core traits and types |
ragfs-fuse |
FUSE filesystem implementation |
ragfs-index |
File indexing engine |
ragfs-chunker |
Document chunking strategies |
ragfs-embed |
Embedding generation (Candle) |
ragfs-extract |
Content extraction |
ragfs-store |
Vector storage (LanceDB) |
ragfs-query |
Query execution |
See docs/ARCHITECTURE.md for detailed architecture documentation.
- Getting Started - 5-minute tutorial
- User Guide - Complete CLI reference
- Configuration - All config options
- Performance Guide - Tuning and optimization
- Troubleshooting - Common issues and solutions
- Architecture - Technical deep-dive
- Architecture Decisions - Why we made these choices
- API Reference - Library usage and types
- Python Bindings - Python SDK and framework integrations
- MCP Server - Claude Desktop integration
- Development Guide - Contributing to RAGFS
- Extraction - Content is extracted from files based on their MIME type
- Chunking - Text is split into overlapping chunks (~512 tokens each)
- Embedding - Each chunk is converted to a 384-dimensional vector using the
gte-smallmodel - Storage - Vectors are stored in LanceDB for efficient similarity search
- Search - Queries are embedded and matched against stored vectors using cosine similarity
- Indices:
~/.local/share/ragfs/indices/{hash}/index.lance - Models:
~/.local/share/ragfs/models/
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
See CONTRIBUTING.md for guidelines.