Skip to content

Commit 5f1da9f

Browse files
Add examples for ColBERT and ColPali ONNX models, and implement SearchAgent functionality
- Introduced new example scripts for ColBERT and ColPali ONNX models to demonstrate their usage in document embedding and semantic matching. - Added a SearchAgent implementation with a retrieval server and embedding storage functionality, enhancing the project's capabilities for search and retrieval tasks. - Updated README to reflect new features and examples, improving documentation clarity and usability.
1 parent b0625fe commit 5f1da9f

File tree

7 files changed

+630
-244
lines changed

7 files changed

+630
-244
lines changed

README.md

Lines changed: 17 additions & 244 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@
4646

4747
EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, lightweight, multisource, multimodal, and local embedding pipeline built in Rust. Whether you're working with text, images, audio, PDFs, websites, or other media, EmbedAnything streamlines the process of generating embeddings from various sources and seamlessly streaming (memory-efficient-indexing) them to a vector database. It supports dense, sparse, ONNX, model2vec and late-interaction embeddings, offering flexibility for a wide range of use cases.
4848

49+
50+
4951
<p align ="center">
5052
<img width=400 src = "https://res.cloudinary.com/dogbbs77y/image/upload/v1766251819/streaming_popagm.png">
5153
</p>
@@ -82,12 +84,14 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
8284
- **No Dependency on Pytorch**: Easy to deploy on cloud, comes with low memory footprint.
8385
- **Highly Modular** : Choose any vectorDB adapter for RAG, with ~~1 line~~ 1 word of code
8486
- **Candle Backend** : Supports BERT, Jina, ColPali, Splade, ModernBERT, Reranker, Qwen
85-
- **ONNX Backend**: Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
86-
- **Cloud Embedding Models:**: Supports OpenAI, Cohere, and Gemini.
87+
- **ONNX Backend** : Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
88+
- **Cloud Embedding Models:** : Supports OpenAI, Cohere, and Gemini.
8789
- **MultiModality** : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
8890
- **GPU support** : Hardware acceleration on GPU as well.
8991
- **Chunking** : In-built chunking methods like semantic, late-chunking
90-
- **Vector Streaming:** Separate file processing, Indexing and Inferencing on different threads, reduces latency.
92+
- **Vector Streaming:** : Separate file processing, Indexing and Inferencing on different threads, reduces latency.
93+
- **AWS S3 Bucket:** : Directly import AWS S3 bucket files.
94+
- **SearchAgent** : Example of how you can use index for Searchr1 reasoning.
9195

9296
## 💡What is Vector Streaming
9397

@@ -109,6 +113,9 @@ The embedding process happens separetly from the main process, so as to maintain
109113
➡️Supports range of models, Dense, Sparse, Late-interaction, ReRanker, ModernBert.<br />
110114
➡️Memory Management: Rust enforces memory management simultaneously, preventing memory leaks and crashes that can plague other languages <br />
111115

116+
**⚠️ WhichModel has been deprecated in pretrained_hf**
117+
118+
112119
## 🍓 Our Past Collaborations:
113120

114121
We have collaborated with reputed enterprise like
@@ -132,6 +139,8 @@ We support any hugging-face models on Candle. And We also support ONNX runtime f
132139

133140
## How to add custom model on candle: from_pretrained_hf
134141

142+
**⚠️ WhichModel has been deprecated in from_pretrained_hf**
143+
135144
```python
136145
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
137146
import embed_anything
@@ -174,14 +183,12 @@ for item in data:
174183
| Reranker | [Jina Reranker Models](https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual), Xenova/bge-reranker, Qwen/Qwen3-Reranker-4B |
175184

176185

177-
178-
179186
## Splade Models (Sparse Embeddings)
180187

181188
Sparse embeddings are useful for keyword-based retrieval and hybrid search scenarios.
182189

183190
```python
184-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
191+
from embed_anything import EmbeddingModel, TextEmbedConfig
185192
import embed_anything
186193

187194
# Load a SPLADE model for sparse embeddings
@@ -211,140 +218,12 @@ ONNX models provide faster inference and lower memory usage. Use the `ONNXModel`
211218
from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype, TextEmbedConfig
212219
import embed_anything
213220

214-
# Option 1: Use a pre-configured ONNX model (recommended)
215-
model = EmbeddingModel.from_pretrained_onnx(
216-
WhichModel.Bert,
217-
model_id=ONNXModel.BGESmallENV15Q # Quantized BGE model for faster inference
218-
)
219-
220221
# Option 2: Use a custom ONNX model from Hugging Face
221222
model = EmbeddingModel.from_pretrained_onnx(
222-
WhichModel.Bert,
223+
WhichModel.Bert
223224
model_id="onnx_model_link",
224225
dtype=Dtype.F16 # Use half precision for faster inference
225226
)
226-
227-
# Embed files with ONNX model
228-
config = TextEmbedConfig(chunk_size=1000, batch_size=32)
229-
data = embed_anything.embed_file("test_files/document.pdf", embedder=model, config=config)
230-
```
231-
232-
### ModernBERT (Quantized)
233-
234-
ModernBERT is a state-of-the-art BERT variant optimized for efficiency.
235-
236-
```python
237-
from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype
238-
239-
# Load quantized ModernBERT for maximum efficiency
240-
model = EmbeddingModel.from_pretrained_onnx(
241-
WhichModel.Bert,
242-
model_id=ONNXModel.ModernBERTBase,
243-
dtype=Dtype.Q4F16 # 4-bit quantized for minimal memory usage
244-
)
245-
246-
# Use it like any other model
247-
data = embed_anything.embed_file("test_files/document.pdf", embedder=model)
248-
```
249-
250-
### ColPali (Document Embedding)
251-
252-
ColPali is optimized for document and image-text embedding tasks.
253-
254-
```python
255-
from embed_anything import ColpaliModel
256-
import numpy as np
257-
258-
# Load ColPali ONNX model
259-
model = ColpaliModel.from_pretrained_onnx(
260-
"starlight-ai/colpali-v1.2-merged-onnx",
261-
None
262-
)
263-
264-
# Embed a PDF file (ColPali processes pages as images)
265-
data = model.embed_file("test_files/document.pdf", batch_size=1)
266-
267-
# Query the embedded document
268-
query = "What is the main topic?"
269-
query_embedding = model.embed_query(query)
270-
271-
# Calculate similarity scores
272-
file_embeddings = np.array([e.embedding for e in data])
273-
query_emb = np.array([e.embedding for e in query_embedding])
274-
275-
# Find most relevant pages
276-
scores = np.einsum("bnd,csd->bcns", query_emb, file_embeddings).max(axis=3).sum(axis=2).squeeze()
277-
top_pages = np.argsort(scores)[::-1][:5]
278-
279-
for page_idx in top_pages:
280-
print(f"Page {data[page_idx].metadata['page_number']}: {data[page_idx].text[:200]}")
281-
```
282-
283-
### ColBERT (Late-Interaction Embeddings)
284-
285-
ColBERT provides token-level embeddings for fine-grained semantic matching.
286-
287-
```python
288-
from embed_anything import ColbertModel
289-
import numpy as np
290-
291-
# Load ColBERT ONNX model
292-
model = ColbertModel.from_pretrained_onnx(
293-
"jinaai/jina-colbert-v2",
294-
path_in_repo="onnx/model.onnx"
295-
)
296-
297-
# Embed sentences
298-
sentences = [
299-
"The quick brown fox jumps over the lazy dog",
300-
"The cat is sleeping on the mat",
301-
"The dog is barking at the moon",
302-
"I love pizza",
303-
"The dog is sitting in the park"
304-
]
305-
306-
# ColBERT returns token-level embeddings
307-
embeddings = model.embed(sentences, batch_size=2)
308-
309-
# Each embedding is a matrix: [num_tokens, embedding_dim]
310-
for i, emb in enumerate(embeddings):
311-
print(f"Sentence {i+1}: {sentences[i]}")
312-
print(f"Embedding shape: {emb.shape}") # Shape: (num_tokens, embedding_dim)
313-
```
314-
315-
### ReRankers
316-
317-
Rerankers improve retrieval quality by re-scoring candidate documents.
318-
319-
```python
320-
from embed_anything import Reranker, Dtype, RerankerResult, DocumentRank
321-
322-
# Load a reranker model
323-
reranker = Reranker.from_pretrained(
324-
"jinaai/jina-reranker-v1-turbo-en",
325-
dtype=Dtype.F16
326-
)
327-
328-
# Query and candidate documents
329-
query = "What is the capital of France?"
330-
candidates = [
331-
"France is a country in Europe.",
332-
"Paris is the capital of France.",
333-
"The Eiffel Tower is in Paris."
334-
]
335-
336-
# Rerank documents (returns top-k results)
337-
results: list[RerankerResult] = reranker.rerank(
338-
[query],
339-
candidates,
340-
top_k=2 # Return top 2 results
341-
)
342-
343-
# Access reranked results
344-
for result in results:
345-
documents: list[DocumentRank] = result.documents
346-
for doc in documents:
347-
print(f"Score: {doc.score:.4f} | Text: {doc.text}")
348227
```
349228

350229
### Cloud Embedding Models (Cohere Embed v4)
@@ -368,57 +247,17 @@ model = EmbeddingModel.from_pretrained_cloud(
368247
data = embed_anything.embed_file("test_files/document.pdf", embedder=model)
369248
```
370249

371-
### Qwen 3 - Embedding
372-
373-
Qwen3 supports over 100 languages including various programming languages.
374-
375-
```python
376-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig, Dtype
377-
import numpy as np
378-
379-
# Initialize Qwen3 embedding model
380-
model = EmbeddingModel.from_pretrained_hf(
381-
WhichModel.Qwen3,
382-
model_id="Qwen/Qwen3-Embedding-0.6B",
383-
dtype=Dtype.F32
384-
)
385-
386-
# Configure embedding
387-
config = TextEmbedConfig(
388-
chunk_size=1000,
389-
batch_size=2,
390-
splitting_strategy="sentence"
391-
)
392-
393-
# Embed a file
394-
data = model.embed_file("test_files/document.pdf", config=config)
395-
396-
# Query embedding
397-
query = "Which GPU is used for training"
398-
query_embedding = np.array(model.embed_query([query])[0].embedding)
399-
400-
# Calculate similarities
401-
embedding_array = np.array([e.embedding for e in data])
402-
similarities = np.matmul(query_embedding, embedding_array.T)
403-
404-
# Get top results
405-
top_5_indices = np.argsort(similarities)[-5:][::-1]
406-
for idx in top_5_indices:
407-
print(f"Score: {similarities[idx]:.4f} | {data[idx].text[:200]}")
408-
```
409-
410250

411251
## For Semantic Chunking
412252

413253
Semantic chunking preserves meaning by splitting text at semantically meaningful boundaries rather than fixed sizes.
414254

415255
```python
416-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
256+
from embed_anything import EmbeddingModel, TextEmbedConfig
417257
import embed_anything
418258

419259
# Main embedding model for generating final embeddings
420260
model = EmbeddingModel.from_pretrained_hf(
421-
WhichModel.Bert,
422261
model_id="sentence-transformers/all-MiniLM-L12-v2"
423262
)
424263

@@ -450,7 +289,7 @@ for item in data:
450289
Late-chunking splits text into smaller units first, then combines them during embedding for better context preservation.
451290

452291
```python
453-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig, EmbedData
292+
from embed_anything import EmbeddingModel, TextEmbedConfig, EmbedData
454293

455294
# Load your embedding model
456295
model = EmbeddingModel.from_pretrained_hf(
@@ -506,30 +345,6 @@ os.add_dll_directory("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/b
506345
| [Benchmarks](https://colab.research.google.com/drive/1nXvd25hDYO-j7QGOIIC0M7MDpovuPCaD?usp=sharing) |
507346

508347

509-
# Usage
510-
511-
## ➡️ Usage For 0.3 and later version
512-
513-
### Basic Text Embedding
514-
515-
```python
516-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
517-
import embed_anything
518-
519-
# Load a model from Hugging Face
520-
model = EmbeddingModel.from_pretrained_local(
521-
WhichModel.Bert,
522-
model_id="sentence-transformers/all-MiniLM-L12-v2"
523-
)
524-
525-
# Simple file embedding with default config
526-
data = embed_anything.embed_file("test_files/test.pdf", embedder=model)
527-
528-
# Access results
529-
for item in data:
530-
print(f"Text chunk: {item.text[:100]}...")
531-
print(f"Embedding shape: {len(item.embedding)}")
532-
```
533348

534349
### Advanced Usage with Configuration
535350

@@ -566,14 +381,6 @@ for item in data:
566381
### Embedding Queries
567382

568383
```python
569-
from embed_anything import EmbeddingModel, WhichModel
570-
import embed_anything
571-
import numpy as np
572-
573-
# Load model
574-
model = EmbeddingModel.from_pretrained_hf(
575-
model_id="sentence-transformers/all-MiniLM-L12-v2"
576-
)
577384

578385
# Embed a query
579386
queries = ["What is machine learning?", "How does neural networks work?"]
@@ -588,16 +395,6 @@ for i, query_emb in enumerate(query_embeddings):
588395
### Embedding Directories
589396

590397
```python
591-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
592-
import embed_anything
593-
594-
# Load model
595-
model = EmbeddingModel.from_pretrained_hf(
596-
model_id="sentence-transformers/all-MiniLM-L12-v2"
597-
)
598-
599-
# Configure
600-
config = TextEmbedConfig(chunk_size=1000, batch_size=32)
601398

602399
# Embed all files in a directory
603400
data = embed_anything.embed_directory(
@@ -609,30 +406,6 @@ data = embed_anything.embed_directory(
609406
print(f"Total chunks: {len(data)}")
610407
```
611408

612-
613-
614-
### Using ONNX Models
615-
616-
ONNX models provide faster inference and lower memory usage. You can use pre-configured models via the `ONNXModel` enum or load custom ONNX models.
617-
618-
#### Using Pre-configured ONNX Models (Recommended)
619-
620-
```python
621-
from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype, TextEmbedConfig
622-
import embed_anything
623-
624-
# Use a pre-configured ONNX model (tested and optimized)
625-
model = EmbeddingModel.from_pretrained_onnx(
626-
WhichModel.Bert,
627-
model_id=ONNXModel.BGESmallENV15Q, # Quantized BGE model
628-
dtype=Dtype.Q4F16 # Quantized 4-bit float16
629-
)
630-
631-
# Embed files
632-
config = TextEmbedConfig(chunk_size=1000, batch_size=32)
633-
data = embed_anything.embed_file("test_files/document.pdf", embedder=model, config=config)
634-
```
635-
636409
#### Using Custom ONNX Models
637410

638411
For custom or fine-tuned models, specify the Hugging Face model ID and path to the ONNX file:
@@ -750,7 +523,7 @@ How to add an adpters: https://starlight-search.com/blog/2024/02/25/adapter-deve
750523
But we're not stopping there! We're actively working to expand this list.
751524

752525
Want to Contribute?
753-
If you’d like to add support for your favorite vector database, we’d love to have your help! Check out our contribution.md for guidelines, or feel free to reach out directly turingatverge@gmail.com . Let's build something amazing together! 💡
526+
If you’d like to add support for your favorite vector database, we’d love to have your help! Check out our contribution.md for guidelines, or feel free to reach out directly sonam@starlight-search.com . Let's build something amazing together! 💡
754527

755528
## AWESOME Projects built on EmbedAnything.
756529
1. A Rust-based cursor like chat with your codebase tool: https://github.com/timpratim/cargo-chat

0 commit comments

Comments
 (0)