Skip to content

Venere-Labs/ragfs

RAGFS

CI Security Audit codecov Documentation License Rust

An agentic FUSE filesystem that makes file management safe and structured for LLM agents. Includes JSON-based operations with undo support, complete audit logging, and AI-powered features like semantic search, auto-organization, and deduplication.

Features

  • Agent File Operations - Structured file ops with JSON feedback via .ops/ interface
  • Safety Layer - Soft delete, audit logging, and undo support via .safety/
  • AI-Powered Management - Auto-organization, deduplication, and cleanup via .semantic/
  • Semantic Search - Query files by meaning using vector similarity search
  • Local Embeddings - Runs entirely offline using the gte-small model via Candle
  • FUSE Integration - Mount indexed directories as a virtual filesystem
  • Real-time Indexing - Watch directories for changes and update the index automatically
  • Multimodal Support - Extract content from text, code, markdown, PDF, and images
  • Code-aware Chunking - Syntax-aware splitting using tree-sitter for source code
  • Hybrid Search - Combine vector similarity with full-text search
  • MCP Server - Claude Desktop integration for AI assistants
  • Comprehensive Testing - 270+ tests across all crates ensuring reliability

Feature Status

Feature Status Notes
CLI (index, query, status) Stable Core functionality
FUSE mount Stable Linux only
Semantic search Stable Vector similarity with LanceDB
Hybrid search Stable Vector + full-text
Text extraction Stable 40+ formats
Code chunking Stable Tree-sitter based
PDF extraction Stable Text + embedded images
Agent operations (.ops/) Stable JSON feedback, batch support
Safety layer (.safety/) Stable Trash, history, undo
Semantic operations (.semantic/) Beta Organize, dedupe, cleanup
Python bindings Beta PyO3 based
MCP server Beta Claude Desktop integration
Image captioning Experimental Optional, requires vision feature

Use Cases

Ideal for:

  • LLM agents managing files (Claude, GPT, local models)
  • Automated file organization and cleanup
  • Safe file operations with audit trail
  • Code repositories (1K-50K files)
  • Documentation collections
  • Research notes and papers
  • Local-first semantic search

Limitations:

  • Linux only (FUSE requirement)
  • Embedding model requires ~500MB disk
  • Large repositories (100K+ files) may need tuning

Requirements

  • Rust 1.88 or later
  • Linux with FUSE support (libfuse-dev on Debian/Ubuntu, fuse on Arch)
  • ~500MB disk space for the embedding model (downloaded on first run)

Installation

# Clone the repository
git clone https://github.com/Venere-Labs/ragfs.git
cd ragfs

# Build in release mode
cargo build --release

# Install to ~/.cargo/bin
cargo install --path crates/ragfs

Quick Start

Index a directory

# Index all files in a directory
ragfs index ~/Documents

# Watch for changes (continuous indexing)
ragfs index ~/Documents --watch

Search your files

# Semantic search
ragfs query ~/Documents "machine learning implementation"

# Get more results
ragfs query ~/Documents "authentication logic" --limit 20

# JSON output for scripting
ragfs query ~/Documents "database connection" --format json

Mount as a filesystem

# Create a mount point
mkdir ~/ragfs-mount

# Mount the indexed directory
ragfs mount ~/Documents ~/ragfs-mount --foreground

Check index status

ragfs status ~/Documents

Agent file operations (via FUSE mount)

# Create a file with feedback
echo -e "docs/new.md\n# New Document" > ~/ragfs-mount/.ragfs/.ops/.create
cat ~/ragfs-mount/.ragfs/.ops/.result  # JSON with undo_id

# Delete a file (soft delete to trash)
echo "docs/old.md" > ~/ragfs-mount/.ragfs/.ops/.delete

# Find similar files
echo "src/main.rs" > ~/ragfs-mount/.ragfs/.semantic/.similar
cat ~/ragfs-mount/.ragfs/.semantic/.similar

# Undo an operation
echo "<undo_id>" > ~/ragfs-mount/.ragfs/.safety/.undo

CLI Reference

ragfs [OPTIONS] <COMMAND>

Commands:
  mount   Mount a directory as a RAGFS filesystem
  index   Index a directory (without mounting)
  query   Query the index
  status  Show index status
  config  Manage configuration

Options:
  -c, --config <FILE>    Config file path [default: ~/.config/ragfs/config.toml]
  -v, --verbose          Enable verbose logging
  -f, --format <FORMAT>  Output format: text, json [default: text]
  -h, --help             Print help
  -V, --version          Print version

mount

ragfs mount <SOURCE> <MOUNTPOINT> [OPTIONS]

Arguments:
  <SOURCE>      Source directory to index
  <MOUNTPOINT>  Mount point

Options:
  -f, --foreground  Run in foreground (don't daemonize)
      --allow-other Allow other users to access the mount

index

ragfs index <PATH> [OPTIONS]

Arguments:
  <PATH>  Directory to index

Options:
  -f, --force  Force reindexing of all files
  -w, --watch  Watch for changes after initial indexing

query

ragfs query <PATH> <QUERY> [OPTIONS]

Arguments:
  <PATH>   Path to indexed directory
  <QUERY>  Query string

Options:
  -l, --limit <LIMIT>  Maximum results [default: 10]

status

ragfs status <PATH>

Arguments:
  <PATH>  Path to indexed directory

config

ragfs config <ACTION>

Actions:
  show  Display current configuration
  init  Print sample config file
  path  Print config file path

Architecture

RAGFS is organized as a Rust workspace with specialized crates:

Crate Description
ragfs CLI application
ragfs-core Core traits and types
ragfs-fuse FUSE filesystem implementation
ragfs-index File indexing engine
ragfs-chunker Document chunking strategies
ragfs-embed Embedding generation (Candle)
ragfs-extract Content extraction
ragfs-store Vector storage (LanceDB)
ragfs-query Query execution

See docs/ARCHITECTURE.md for detailed architecture documentation.

Documentation

How It Works

  1. Extraction - Content is extracted from files based on their MIME type
  2. Chunking - Text is split into overlapping chunks (~512 tokens each)
  3. Embedding - Each chunk is converted to a 384-dimensional vector using the gte-small model
  4. Storage - Vectors are stored in LanceDB for efficient similarity search
  5. Search - Queries are embedded and matched against stored vectors using cosine similarity

Storage Locations

  • Indices: ~/.local/share/ragfs/indices/{hash}/index.lance
  • Models: ~/.local/share/ragfs/models/

License

Licensed under either of:

at your option.

Contributing

See CONTRIBUTING.md for guidelines.

About

A FUSE semantic filesystem for LLM/AI AgentsAgentic FUSE filesystem for LLM agents. Structured file operations with JSON feedback, audit logging, undo support, and semantic search.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors