4646
4747EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, lightweight, multisource, multimodal, and local embedding pipeline built in Rust. Whether you're working with text, images, audio, PDFs, websites, or other media, EmbedAnything streamlines the process of generating embeddings from various sources and seamlessly streaming (memory-efficient-indexing) them to a vector database. It supports dense, sparse, ONNX, model2vec and late-interaction embeddings, offering flexibility for a wide range of use cases.
4848
49+
50+
4951<p align =" center " >
5052<img width =400 src = " https://res.cloudinary.com/dogbbs77y/image/upload/v1766251819/streaming_popagm.png " >
5153</p >
@@ -82,12 +84,14 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
8284- ** No Dependency on Pytorch** : Easy to deploy on cloud, comes with low memory footprint.
8385- ** Highly Modular** : Choose any vectorDB adapter for RAG, with ~~ 1 line~~ 1 word of code
8486- ** Candle Backend** : Supports BERT, Jina, ColPali, Splade, ModernBERT, Reranker, Qwen
85- - ** ONNX Backend** : Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
86- - ** Cloud Embedding Models:** : Supports OpenAI, Cohere, and Gemini.
87+ - ** ONNX Backend** : Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
88+ - ** Cloud Embedding Models:** : Supports OpenAI, Cohere, and Gemini.
8789- ** MultiModality** : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
8890- ** GPU support** : Hardware acceleration on GPU as well.
8991- ** Chunking** : In-built chunking methods like semantic, late-chunking
90- - ** Vector Streaming:** Separate file processing, Indexing and Inferencing on different threads, reduces latency.
92+ - ** Vector Streaming:** : Separate file processing, Indexing and Inferencing on different threads, reduces latency.
93+ - ** AWS S3 Bucket:** : Directly import AWS S3 bucket files.
94+ - ** SearchAgent** : Example of how you can use index for Searchr1 reasoning.
9195
9296## 💡What is Vector Streaming
9397
@@ -109,6 +113,9 @@ The embedding process happens separetly from the main process, so as to maintain
109113➡️Supports range of models, Dense, Sparse, Late-interaction, ReRanker, ModernBert.<br />
110114➡️Memory Management: Rust enforces memory management simultaneously, preventing memory leaks and crashes that can plague other languages <br />
111115
116+ ** ⚠️ WhichModel has been deprecated in pretrained_hf**
117+
118+
112119## 🍓 Our Past Collaborations:
113120
114121We have collaborated with reputed enterprise like
@@ -132,6 +139,8 @@ We support any hugging-face models on Candle. And We also support ONNX runtime f
132139
133140## How to add custom model on candle: from_pretrained_hf
134141
142+ ** ⚠️ WhichModel has been deprecated in from_pretrained_hf**
143+
135144``` python
136145from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
137146import embed_anything
@@ -174,14 +183,12 @@ for item in data:
174183| Reranker | [ Jina Reranker Models] ( https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual ) , Xenova/bge-reranker, Qwen/Qwen3-Reranker-4B |
175184
176185
177-
178-
179186## Splade Models (Sparse Embeddings)
180187
181188Sparse embeddings are useful for keyword-based retrieval and hybrid search scenarios.
182189
183190``` python
184- from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
191+ from embed_anything import EmbeddingModel, TextEmbedConfig
185192import embed_anything
186193
187194# Load a SPLADE model for sparse embeddings
@@ -211,140 +218,12 @@ ONNX models provide faster inference and lower memory usage. Use the `ONNXModel`
211218from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype, TextEmbedConfig
212219import embed_anything
213220
214- # Option 1: Use a pre-configured ONNX model (recommended)
215- model = EmbeddingModel.from_pretrained_onnx(
216- WhichModel.Bert,
217- model_id = ONNXModel.BGESmallENV15Q # Quantized BGE model for faster inference
218- )
219-
220221# Option 2: Use a custom ONNX model from Hugging Face
221222model = EmbeddingModel.from_pretrained_onnx(
222- WhichModel.Bert,
223+ WhichModel.Bert
223224 model_id = " onnx_model_link" ,
224225 dtype = Dtype.F16 # Use half precision for faster inference
225226)
226-
227- # Embed files with ONNX model
228- config = TextEmbedConfig(chunk_size = 1000 , batch_size = 32 )
229- data = embed_anything.embed_file(" test_files/document.pdf" , embedder = model, config = config)
230- ```
231-
232- ### ModernBERT (Quantized)
233-
234- ModernBERT is a state-of-the-art BERT variant optimized for efficiency.
235-
236- ``` python
237- from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype
238-
239- # Load quantized ModernBERT for maximum efficiency
240- model = EmbeddingModel.from_pretrained_onnx(
241- WhichModel.Bert,
242- model_id = ONNXModel.ModernBERTBase,
243- dtype = Dtype.Q4F16 # 4-bit quantized for minimal memory usage
244- )
245-
246- # Use it like any other model
247- data = embed_anything.embed_file(" test_files/document.pdf" , embedder = model)
248- ```
249-
250- ### ColPali (Document Embedding)
251-
252- ColPali is optimized for document and image-text embedding tasks.
253-
254- ``` python
255- from embed_anything import ColpaliModel
256- import numpy as np
257-
258- # Load ColPali ONNX model
259- model = ColpaliModel.from_pretrained_onnx(
260- " starlight-ai/colpali-v1.2-merged-onnx" ,
261- None
262- )
263-
264- # Embed a PDF file (ColPali processes pages as images)
265- data = model.embed_file(" test_files/document.pdf" , batch_size = 1 )
266-
267- # Query the embedded document
268- query = " What is the main topic?"
269- query_embedding = model.embed_query(query)
270-
271- # Calculate similarity scores
272- file_embeddings = np.array([e.embedding for e in data])
273- query_emb = np.array([e.embedding for e in query_embedding])
274-
275- # Find most relevant pages
276- scores = np.einsum(" bnd,csd->bcns" , query_emb, file_embeddings).max(axis = 3 ).sum(axis = 2 ).squeeze()
277- top_pages = np.argsort(scores)[::- 1 ][:5 ]
278-
279- for page_idx in top_pages:
280- print (f " Page { data[page_idx].metadata[' page_number' ]} : { data[page_idx].text[:200 ]} " )
281- ```
282-
283- ### ColBERT (Late-Interaction Embeddings)
284-
285- ColBERT provides token-level embeddings for fine-grained semantic matching.
286-
287- ``` python
288- from embed_anything import ColbertModel
289- import numpy as np
290-
291- # Load ColBERT ONNX model
292- model = ColbertModel.from_pretrained_onnx(
293- " jinaai/jina-colbert-v2" ,
294- path_in_repo = " onnx/model.onnx"
295- )
296-
297- # Embed sentences
298- sentences = [
299- " The quick brown fox jumps over the lazy dog" ,
300- " The cat is sleeping on the mat" ,
301- " The dog is barking at the moon" ,
302- " I love pizza" ,
303- " The dog is sitting in the park"
304- ]
305-
306- # ColBERT returns token-level embeddings
307- embeddings = model.embed(sentences, batch_size = 2 )
308-
309- # Each embedding is a matrix: [num_tokens, embedding_dim]
310- for i, emb in enumerate (embeddings):
311- print (f " Sentence { i+ 1 } : { sentences[i]} " )
312- print (f " Embedding shape: { emb.shape} " ) # Shape: (num_tokens, embedding_dim)
313- ```
314-
315- ### ReRankers
316-
317- Rerankers improve retrieval quality by re-scoring candidate documents.
318-
319- ``` python
320- from embed_anything import Reranker, Dtype, RerankerResult, DocumentRank
321-
322- # Load a reranker model
323- reranker = Reranker.from_pretrained(
324- " jinaai/jina-reranker-v1-turbo-en" ,
325- dtype = Dtype.F16
326- )
327-
328- # Query and candidate documents
329- query = " What is the capital of France?"
330- candidates = [
331- " France is a country in Europe." ,
332- " Paris is the capital of France." ,
333- " The Eiffel Tower is in Paris."
334- ]
335-
336- # Rerank documents (returns top-k results)
337- results: list[RerankerResult] = reranker.rerank(
338- [query],
339- candidates,
340- top_k = 2 # Return top 2 results
341- )
342-
343- # Access reranked results
344- for result in results:
345- documents: list[DocumentRank] = result.documents
346- for doc in documents:
347- print (f " Score: { doc.score:.4f } | Text: { doc.text} " )
348227```
349228
350229### Cloud Embedding Models (Cohere Embed v4)
@@ -368,57 +247,17 @@ model = EmbeddingModel.from_pretrained_cloud(
368247data = embed_anything.embed_file(" test_files/document.pdf" , embedder = model)
369248```
370249
371- ### Qwen 3 - Embedding
372-
373- Qwen3 supports over 100 languages including various programming languages.
374-
375- ``` python
376- from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig, Dtype
377- import numpy as np
378-
379- # Initialize Qwen3 embedding model
380- model = EmbeddingModel.from_pretrained_hf(
381- WhichModel.Qwen3,
382- model_id = " Qwen/Qwen3-Embedding-0.6B" ,
383- dtype = Dtype.F32
384- )
385-
386- # Configure embedding
387- config = TextEmbedConfig(
388- chunk_size = 1000 ,
389- batch_size = 2 ,
390- splitting_strategy = " sentence"
391- )
392-
393- # Embed a file
394- data = model.embed_file(" test_files/document.pdf" , config = config)
395-
396- # Query embedding
397- query = " Which GPU is used for training"
398- query_embedding = np.array(model.embed_query([query])[0 ].embedding)
399-
400- # Calculate similarities
401- embedding_array = np.array([e.embedding for e in data])
402- similarities = np.matmul(query_embedding, embedding_array.T)
403-
404- # Get top results
405- top_5_indices = np.argsort(similarities)[- 5 :][::- 1 ]
406- for idx in top_5_indices:
407- print (f " Score: { similarities[idx]:.4f } | { data[idx].text[:200 ]} " )
408- ```
409-
410250
411251## For Semantic Chunking
412252
413253Semantic chunking preserves meaning by splitting text at semantically meaningful boundaries rather than fixed sizes.
414254
415255``` python
416- from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
256+ from embed_anything import EmbeddingModel, TextEmbedConfig
417257import embed_anything
418258
419259# Main embedding model for generating final embeddings
420260model = EmbeddingModel.from_pretrained_hf(
421- WhichModel.Bert,
422261 model_id = " sentence-transformers/all-MiniLM-L12-v2"
423262)
424263
@@ -450,7 +289,7 @@ for item in data:
450289Late-chunking splits text into smaller units first, then combines them during embedding for better context preservation.
451290
452291``` python
453- from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig, EmbedData
292+ from embed_anything import EmbeddingModel, TextEmbedConfig, EmbedData
454293
455294# Load your embedding model
456295model = EmbeddingModel.from_pretrained_hf(
@@ -506,30 +345,6 @@ os.add_dll_directory("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/b
506345| [ Benchmarks] ( https://colab.research.google.com/drive/1nXvd25hDYO-j7QGOIIC0M7MDpovuPCaD?usp=sharing ) |
507346
508347
509- # Usage
510-
511- ## ➡️ Usage For 0.3 and later version
512-
513- ### Basic Text Embedding
514-
515- ``` python
516- from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
517- import embed_anything
518-
519- # Load a model from Hugging Face
520- model = EmbeddingModel.from_pretrained_local(
521- WhichModel.Bert,
522- model_id = " sentence-transformers/all-MiniLM-L12-v2"
523- )
524-
525- # Simple file embedding with default config
526- data = embed_anything.embed_file(" test_files/test.pdf" , embedder = model)
527-
528- # Access results
529- for item in data:
530- print (f " Text chunk: { item.text[:100 ]} ... " )
531- print (f " Embedding shape: { len (item.embedding)} " )
532- ```
533348
534349### Advanced Usage with Configuration
535350
@@ -566,14 +381,6 @@ for item in data:
566381### Embedding Queries
567382
568383``` python
569- from embed_anything import EmbeddingModel, WhichModel
570- import embed_anything
571- import numpy as np
572-
573- # Load model
574- model = EmbeddingModel.from_pretrained_hf(
575- model_id = " sentence-transformers/all-MiniLM-L12-v2"
576- )
577384
578385# Embed a query
579386queries = [" What is machine learning?" , " How does neural networks work?" ]
@@ -588,16 +395,6 @@ for i, query_emb in enumerate(query_embeddings):
588395### Embedding Directories
589396
590397``` python
591- from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
592- import embed_anything
593-
594- # Load model
595- model = EmbeddingModel.from_pretrained_hf(
596- model_id = " sentence-transformers/all-MiniLM-L12-v2"
597- )
598-
599- # Configure
600- config = TextEmbedConfig(chunk_size = 1000 , batch_size = 32 )
601398
602399# Embed all files in a directory
603400data = embed_anything.embed_directory(
@@ -609,30 +406,6 @@ data = embed_anything.embed_directory(
609406print (f " Total chunks: { len (data)} " )
610407```
611408
612-
613-
614- ### Using ONNX Models
615-
616- ONNX models provide faster inference and lower memory usage. You can use pre-configured models via the ` ONNXModel ` enum or load custom ONNX models.
617-
618- #### Using Pre-configured ONNX Models (Recommended)
619-
620- ``` python
621- from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype, TextEmbedConfig
622- import embed_anything
623-
624- # Use a pre-configured ONNX model (tested and optimized)
625- model = EmbeddingModel.from_pretrained_onnx(
626- WhichModel.Bert,
627- model_id = ONNXModel.BGESmallENV15Q, # Quantized BGE model
628- dtype = Dtype.Q4F16 # Quantized 4-bit float16
629- )
630-
631- # Embed files
632- config = TextEmbedConfig(chunk_size = 1000 , batch_size = 32 )
633- data = embed_anything.embed_file(" test_files/document.pdf" , embedder = model, config = config)
634- ```
635-
636409#### Using Custom ONNX Models
637410
638411For custom or fine-tuned models, specify the Hugging Face model ID and path to the ONNX file:
@@ -750,7 +523,7 @@ How to add an adpters: https://starlight-search.com/blog/2024/02/25/adapter-deve
750523But we're not stopping there! We're actively working to expand this list.
751524
752525Want to Contribute?
753- If you’d like to add support for your favorite vector database, we’d love to have your help! Check out our contribution.md for guidelines, or feel free to reach out directly turingatverge@gmail .com . Let's build something amazing together! 💡
526+ If you’d like to add support for your favorite vector database, we’d love to have your help! Check out our contribution.md for guidelines, or feel free to reach out directly sonam@starlight-search .com . Let's build something amazing together! 💡
754527
755528## AWESOME Projects built on EmbedAnything.
7565291 . A Rust-based cursor like chat with your codebase tool: https://github.com/timpratim/cargo-chat
0 commit comments