Skip to content

YASSERRMD/barq-db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

213 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Barq DB

Barq DB Logo

Retrieval-Focused Data System for AI Applications
Vector Search · Hybrid Retrieval · Ingestion-Aware Architecture

License Release PyPI gRPC First Rust


Overview

Barq-DB v2 is a retrieval-focused data system built in Rust for modern AI workloads.

It combines:

  • Dense vector search
  • BM25 text retrieval
  • Async ingestion pipelines
  • Segment-based storage lifecycle

into a unified architecture designed for:

  • RAG systems
  • semantic search
  • AI-powered recommendations

Why Barq DB

Barq-DB is designed as a retrieval system rather than a standalone vector store.

Ingestion, indexing, and querying are treated as coordinated stages of a single pipeline, enabling better control over performance, memory usage, and long-running stability.


Key Highlights (v2)

Memory Control

  • Disk-backed vector storage using mmap
  • Configurable memory budgeting and eviction
  • Reduced RAM pressure for large datasets

Async Ingestion

  • Queue-based ingestion with batching
  • Explicit backpressure handling
  • Stable under sustained write load

Segment Lifecycle

  • Explicit lifecycle: Growing → Sealed → Compacted
  • Background compaction
  • Improved long-running stability

Hybrid Retrieval

  • Combined vector similarity and BM25 keyword search
  • Reciprocal Rank Fusion (RRF)
  • Deterministic result merging

gRPC-First API

  • proto/barq.proto is the canonical API contract
  • SDKs aligned to gRPC
  • REST maintained for compatibility

Architecture

Barq-DB v2 architecture


Storage and Memory Model

  • Hot segments and indexes may reside in memory
  • Cold data is accessed through mmap-backed storage
  • Memory usage is bounded through configurable limits
  • Eviction policies prevent uncontrolled memory growth

Durability Model

  • Writes are persisted through WAL before acknowledgment (configurable)
  • Recovery replays WAL into segment state
  • Snapshots and compaction reduce recovery time

Consistency Model (Current)

  • Single-node deployments acknowledge writes with NodeLocal durability
  • Replicated multi-node deployments now route writes through per-shard Raft quorum commit before acknowledgment
  • The runtime consensus path is backed by deterministic Raft leader election, stale-leader rejection, and follower catch-up logic
  • Single-replica multi-node deployments remain routed replication without quorum durability
  • The current Raft engine is deterministic and in-memory; durable term/log persistence and real inter-node transport are still future work

Benchmarking

Barq-DB v2 includes built-in benchmarking tools.

Designed to evaluate:

  • Ingestion throughput
  • Query latency (p50 / p95 / p99) from live in-process searches
  • Memory usage under load
  • RSS before and after a benchmark run

Supports dataset simulations at scale (1M, 10M, and higher).

Benchmark smoke coverage is checked in CI through .github/workflows/benchmarks.yml.


API and SDK

Barq-DB v2 introduces a gRPC-first architecture.

  • gRPC is the primary API surface
  • REST is maintained for compatibility
  • SDKs available in:
    • Python
    • TypeScript
    • Go
    • Rust

SDK Compatibility

  • No breaking changes to existing SDK methods
  • New features exposed via optional parameters

New Capabilities

  • Insert options:
    • wait_for_commit
  • Search options:
    • allow_fallback
    • consistency
  • Async ingestion support
  • Metrics and admin APIs

Quick Start

Run with Docker

docker-compose up -d

Run from Source

cargo run --bin barq-server

Endpoints:


Example (Python)

from barq import BarqClient

client = BarqClient("http://localhost:8080", api_key="your-key")

client.create_collection(name="products", dimension=384, metric="Cosine")

client.insert_document(
    collection="products",
    id=1,
    vector=[0.1, 0.2, ...],
    payload={"name": "Widget"}
)

results = client.search(collection="products", vector=query_vector, top_k=10)

Project Structure

Crate Description
barq-core Data structures and catalog
barq-index HNSW, IVF, SIMD kernels
barq-bm25 Text search engine
barq-storage WAL, snapshots, persistence
barq-cluster Sharding and routing
barq-api gRPC and REST APIs

Reality Check

Barq-DB v2 introduces a stronger and more structured architecture.

However, it still requires continued validation under real-world workloads, particularly for large-scale and distributed scenarios.


License

MIT License

About

Rust-based retrieval system with hybrid search (vector + BM25), async ingestion, and gRPC-first API

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors