Skip to content

antflydb/antfly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

251 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Antfly

Antfly is a distributed search engine built on etcd's raft library. It combines full-text search (BM25), vector similarity, and graph traversal over multimodal data — text, images, audio, and video. Embeddings, chunking, and graph edges are generated automatically as you write data. Built-in RAG agents tie it all together with retrieval-augmented generation.

Quickstart

Quick Start

# Start a single-node cluster with built-in ML inference
go run ./cmd/antfly swarm

# Or run with Docker
docker run -p 8080:8080 ghcr.io/antflydb/antfly:omni

That gives you the Antfarm dashboard at http://localhost:8080 — playgrounds for search, RAG, knowledge graphs, embeddings, reranking, and more.

See the quickstart guide for a full walkthrough.

Features

  • Hybrid search — full-text (BM25), dense vectors, and sparse vectors (SPLADE), all in one query
  • RAG agents — built-in retrieval-augmented generation with streaming, multi-turn chat, tool calling (web search, graph traversal), and confidence scoring
  • Graph indexes — automatic relationship extraction and graph traversal queries over your data
  • Multimodal — index and search images, audio, and video with CLIP, CLAP, and vision-language models
  • Reranking — cross-encoder reranking with score-based pruning to cut the noise
  • Aggregations — stats (sum/min/max/avg) and terms facets for analytics
  • Transactions — ACID transactions at the shard level with distributed coordination
  • Document TTL — automatic document expiration so you don't have to clean up yourself
  • S3 storage — store data in S3/MinIO/R2 for big cost savings and way faster shard splits
  • SIMD / SME acceleration — vector operations use hardware intrinsics via go-highway on x86 and ARM
  • Distributed — Raft consensus, automatic sharding and replication, horizontal scaling
  • Enrichment pipelinesconfigurable pipelines per index for embeddings, summaries, graph edges, and custom computed fields
  • Bring your own models — Ollama, OpenAI, Bedrock, Google, or run models locally with Termite
  • Auth — built-in user management with API keys, basic auth, and bearer tokens
  • Backup & restore — to local disk or S3
  • Kubernetes operator — deploy and manage clusters with the operator
  • MCP serverModel Context Protocol so LLMs can use Antfly as a tool
  • A2A protocolAgent-to-Agent support for Google's A2A standard
  • Antfarmweb dashboard with playgrounds for search, RAG, knowledge graphs, embeddings, reranking, chunking, NER, OCR, and transcription

Documentation

antfly.io/docs

SDKs & Client Libraries

Language Package Source
Go github.com/antflydb/antfly/pkg/client pkg/client
TypeScript @antfly/sdk ts/packages/sdk
Python antfly py/
React @antfly/components ts/packages/components
PostgreSQL pgaf extension rs/pgaf

pgaf — PostgreSQL Extension

pgaf brings Antfly search into Postgres. Create an index, use the @@@ operator, and you're done:

CREATE INDEX idx_content ON docs USING antfly (content)
  WITH (url = 'http://localhost:8080/api/v1/', collection = 'my_docs');

SELECT * FROM docs WHERE content @@@ 'fix my computer';

React Components

@antfly/components gives you drop-in React components for search UIs — SearchBox, Autosuggest, Facet, Results, RAGBox, AnswerBox, plus streaming hooks like useAnswerStream and useCitations.

Termite — ML Inference

Termite handles the ML side: embeddings, chunking, reranking, classification, NER, OCR, transcription, generation, and more. It ships as a submodule and runs automatically in swarm mode — you don't need to set it up separately.

Libraries & Tools

Package What it does Source
docsaf Ingest content from filesystem, web crawl, git repos, and S3 pkg/docsaf
evalaf LLM/RAG/agent evaluation ("promptfoo for Go") pkg/evalaf
Genkit plugin Firebase Genkit integration for retrieval and docstore pkg/genkit/antfly

Architecture

Antfly uses a multi-raft design with separate consensus groups:

  • Metadata raft — table schemas, shard assignments, cluster topology
  • Storage rafts — one per shard, handling data, indexes, and queries

End-to-end chaos tests — inspired by Jepsen — cover node crashes, leader failures, shard splits under load, and cluster scaling. These tests run real multi-node clusters and inject faults to verify that Raft consensus, transactions, and replication behave correctly under failure.

Critical distributed protocols are formally specified and model-checked with TLA+:

Community

Join the Discord for support, discussion, and updates.

Interested in contributing? See CONTRIBUTING.md.

License

The core server is Elastic License 2.0 (ELv2). That means you can use it, modify it, self-host it, and build products on top of it — you just can't offer Antfly itself as a managed service. Everything else — the SDKs, React components, Termite, pgaf, docsaf, evalaf — is Apache 2.0. We tried to keep as much as possible under a permissive license.