Skip to content

Commit b0661d9

Browse files
2 parents ff13f33 + e84e436 commit b0661d9

6 files changed

Lines changed: 388 additions & 10 deletions

File tree

README.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,14 +83,13 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
8383

8484
- **No Dependency on Pytorch**: Easy to deploy on cloud, comes with low memory footprint.
8585
- **Highly Modular** : Choose any vectorDB adapter for RAG, with ~~1 line~~ 1 word of code
86-
- **Candle Backend** : Supports BERT, Jina, ColPali, Splade, ModernBERT, Reranker, Qwen
87-
- **ONNX Backend** : Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
88-
- **Cloud Embedding Models:** : Supports OpenAI, Cohere, and Gemini.
86+
- **Backend** : Supports Candle, ONNX and cloud models
8987
- **MultiModality** : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
9088
- **GPU support** : Hardware acceleration on GPU as well.
9189
- **Chunking** : In-built chunking methods like semantic, late-chunking
9290
- **Vector Streaming:** : Separate file processing, Indexing and Inferencing on different threads, reduces latency.
9391
- **AWS S3 Bucket:** : Directly import AWS S3 bucket files.
92+
- **Prebult Docker Image** : Just pull it: starlightsearch/embedanything-server
9493
- **SearchAgent** : Example of how you can use index for Searchr1 reasoning.
9594

9695
## 💡What is Vector Streaming
@@ -142,8 +141,9 @@ We support any hugging-face models on Candle. And We also support ONNX runtime f
142141
**⚠️ WhichModel has been deprecated in from_pretrained_hf**
143142

144143
```python
145-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
146144
import embed_anything
145+
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
146+
147147

148148
# Load a custom BERT model from Hugging Face
149149
model = EmbeddingModel.from_pretrained_hf(
@@ -188,8 +188,9 @@ for item in data:
188188
Sparse embeddings are useful for keyword-based retrieval and hybrid search scenarios.
189189

190190
```python
191-
from embed_anything import EmbeddingModel, TextEmbedConfig
192191
import embed_anything
192+
from embed_anything import EmbeddingModel, TextEmbedConfig
193+
193194

194195
# Load a SPLADE model for sparse embeddings
195196
model = EmbeddingModel.from_pretrained_hf(
@@ -215,8 +216,9 @@ ONNX models provide faster inference and lower memory usage. Use the `ONNXModel`
215216
### BERT Models
216217

217218
```python
218-
from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype, TextEmbedConfig
219219
import embed_anything
220+
from embed_anything import EmbeddingModel, WhichModel, ONNXModel, Dtype, TextEmbedConfig
221+
220222

221223
# Option 2: Use a custom ONNX model from Hugging Face
222224
model = EmbeddingModel.from_pretrained_onnx(
@@ -231,6 +233,7 @@ model = EmbeddingModel.from_pretrained_onnx(
231233
Use cloud models for high-quality embeddings without local model deployment.
232234

233235
```python
236+
import embed_anything
234237
from embed_anything import EmbeddingModel, WhichModel
235238
import os
236239

@@ -253,8 +256,8 @@ data = embed_anything.embed_file("test_files/document.pdf", embedder=model)
253256
Semantic chunking preserves meaning by splitting text at semantically meaningful boundaries rather than fixed sizes.
254257

255258
```python
256-
from embed_anything import EmbeddingModel, TextEmbedConfig
257259
import embed_anything
260+
from embed_anything import EmbeddingModel, TextEmbedConfig
258261

259262
# Main embedding model for generating final embeddings
260263
model = EmbeddingModel.from_pretrained_hf(
@@ -289,6 +292,7 @@ for item in data:
289292
Late-chunking splits text into smaller units first, then combines them during embedding for better context preservation.
290293

291294
```python
295+
import embed_anything
292296
from embed_anything import EmbeddingModel, TextEmbedConfig, EmbedData
293297

294298
# Load your embedding model
@@ -349,8 +353,8 @@ os.add_dll_directory("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/b
349353
### Advanced Usage with Configuration
350354

351355
```python
352-
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
353356
import embed_anything
357+
from embed_anything import EmbeddingModel, WhichModel, TextEmbedConfig
354358

355359
# Load model
356360
model = EmbeddingModel.from_pretrained_hf(
@@ -411,6 +415,7 @@ print(f"Total chunks: {len(data)}")
411415
For custom or fine-tuned models, specify the Hugging Face model ID and path to the ONNX file:
412416

413417
```python
418+
import embed_anything
414419
from embed_anything import EmbeddingModel, WhichModel, Dtype
415420

416421
# Load a custom ONNX model from Hugging Face
Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
---
2+
draft: false
3+
date: 2026-01-11
4+
authors:
5+
- sonam
6+
slug: release-notes-7
7+
title: Release Notes 0.7
8+
---
9+
# Release Notes 0.7
10+
11+
0.7 is all about making deployment easy and integrations to data sources. Including Prebuilt Docker Image,SearchR1 example to improve context for Agents and AWS S3 bucket Integration.
12+
13+
<!-- more -->
14+
15+
## 📋 Summary of Changes (0.6.8 → 0.7.0)
16+
17+
This release represents a significant milestone in making EmbedAnything easier to deploy and more powerful for production use. Here's a summary of the key improvements and additions since version 0.6.8:
18+
19+
### 🚀 Major Features
20+
21+
- **Prebuilt Docker Image**: Production-ready Docker image available for immediate deployment
22+
- **AWS S3 Integration**: Direct support for fetching and embedding files from S3 buckets
23+
- **SearchR1 Agent**: Advanced agent framework for interweaving retrieved results to improve context
24+
25+
### 🐛 Bug Fixes & Stability
26+
27+
- **Fixed Parameter Mismatching in Rerank**: Resolved issues with reranking function parameters
28+
- **Fixed Qwen3Embed Concurrency Panic**: Addressed panic issues when using Qwen3 embeddings concurrently
29+
- **Fixed File Extension Handling**: Improved error handling for files without extensions
30+
- **Enhanced Error Handling**: Better error messages and recovery mechanisms
31+
32+
33+
---
34+
35+
## 🐳 Prebuilt Docker Image
36+
37+
We're excited to announce that a prebuilt Docker image is now available! You can now pull the image and spin up the server without building from source. This makes deployment significantly easier and faster.
38+
39+
### Quick Start with Docker
40+
41+
#### Pull the Prebuilt Image
42+
43+
```bash
44+
docker pull starlightsearch/embedanything-server:latest
45+
```
46+
47+
#### Run the Container
48+
49+
```bash
50+
docker run -p 8080:8080 starlightsearch/embedanything-server:latest
51+
```
52+
53+
The server will start on `http://0.0.0.0:8080`.
54+
55+
### Building from Source
56+
57+
If you prefer to build the Docker image yourself, you can use the provided Dockerfile:
58+
59+
```bash
60+
docker build -f server.Dockerfile -t embedanything-server .
61+
docker run -p 8080:8080 embedanything-server
62+
```
63+
64+
### Server Features
65+
66+
The Actix server provides an OpenAI-compatible API for generating embeddings. We chose Actix Server for:
67+
68+
1. **Blazing fast**: Consistently ranks among the fastest web frameworks in benchmarks like TechEmpower
69+
2. **Asynchronous by default**: Built on Rust's async/await, enabling efficient I/O-bound workloads
70+
3. **Lightweight & modular**: Minimal core with extensible middleware, plugins, and integrations
71+
4. **Type-safe**: Strong type guarantees ensure fewer runtime surprises
72+
5. **Production-ready**: Stable, mature, and already used in industries like fintech, IoT, and SaaS platforms
73+
74+
For benchmarks between Python and Rust servers, check out this blog: https://www.jonvet.com/blog/benchmarking-python-rust-web-servers
75+
76+
### API Usage
77+
78+
#### Create Embeddings
79+
80+
**Endpoint:** `POST /v1/embeddings`
81+
82+
**Request:**
83+
```json
84+
{
85+
"model": "sentence-transformers/all-MiniLM-L12-v2",
86+
"input": ["The quick brown fox jumps over the lazy dog"]
87+
}
88+
```
89+
90+
**Response:**
91+
```json
92+
{
93+
"object": "list",
94+
"data": [
95+
{
96+
"object": "embedding",
97+
"index": 0,
98+
"embedding": [0.0023064255, -0.009327292, ...]
99+
}
100+
],
101+
"model": "sentence-transformers/all-MiniLM-L12-v2",
102+
"usage": {
103+
"prompt_tokens": 9,
104+
"total_tokens": 9
105+
}
106+
}
107+
```
108+
109+
#### Health Check
110+
111+
**Endpoint:** `GET /health_check`
112+
113+
Returns a 200 OK status if the server is running.
114+
115+
#### Example Usage with curl
116+
117+
```bash
118+
# Create embeddings
119+
curl -X POST http://localhost:8080/v1/embeddings \
120+
-H "Content-Type: application/json" \
121+
-d '{
122+
"model": "sentence-transformers/all-MiniLM-L12-v2",
123+
"input": ["Hello world", "How are you?"]
124+
}'
125+
126+
# Health check
127+
curl http://localhost:8080/health_check
128+
```
129+
130+
#### Example Usage with Python
131+
132+
```python
133+
import requests
134+
135+
# Create embeddings
136+
response = requests.post(
137+
"http://localhost:8080/v1/embeddings",
138+
json={
139+
"model": "sentence-transformers/all-MiniLM-L12-v2",
140+
"input": ["The quick brown fox jumps over the lazy dog"]
141+
}
142+
)
143+
144+
if response.status_code == 200:
145+
data = response.json()
146+
print(f"Generated {len(data['data'])} embeddings")
147+
print(f"First embedding dimension: {len(data['data'][0]['embedding'])}")
148+
else:
149+
print(f"Error: {response.json()}")
150+
```
151+
152+
### Error Handling
153+
154+
The API returns OpenAI-compatible error responses:
155+
156+
```json
157+
{
158+
"error": {
159+
"message": "Error description",
160+
"type": "error_type",
161+
"code": "error_code"
162+
}
163+
}
164+
```
165+
166+
For more details, see the [Actix Server Guide](/docs/guides/actix_server.md).
167+
168+
## 🔍 SearchR1 Agent Integration
169+
170+
We've included the SearchR1 agent, which is a powerful method of interweaving retrieved results to improve context. This agent enables more sophisticated reasoning by dynamically integrating search results into the generation process.
171+
172+
### How SearchR1 Works
173+
174+
SearchR1 uses a unique approach where:
175+
- The model conducts reasoning inside `<think>` tags
176+
- When knowledge gaps are identified, it calls a search engine via `<search> query </search>` tags
177+
- Search results are returned between `<information>` and `</information>` tags
178+
- The agent can search multiple times, iteratively refining its understanding
179+
- Once sufficient information is gathered, it provides the answer inside `<answer>` tags
180+
181+
This interweaving of retrieved results with reasoning creates a more contextually aware and accurate response generation process. The agent actively identifies knowledge gaps and explores different perspectives of a topic before providing a final answer.
182+
183+
### Example Usage
184+
185+
The SearchR1 agent is available in our examples. Check out `examples/SearchAgent/` for complete implementation examples showing how to integrate SearchR1 with EmbedAnything's retrieval capabilities using LanceDB.
186+
187+
## ☁️ Direct AWS S3 Bucket Integration
188+
189+
We've added direct integration with AWS S3 buckets, allowing you to fetch and embed files directly from your S3 storage without manual downloads.
190+
191+
### Features
192+
193+
- Fetch files directly from S3 buckets
194+
- Support for explicit credentials or environment variables
195+
- Seamless integration with EmbedAnything's embedding pipeline
196+
- Save files locally or work with them in memory
197+
198+
### Usage
199+
200+
#### Using Explicit Credentials
201+
202+
```python
203+
from embed_anything import S3Client, EmbeddingModel, WhichModel, TextEmbedConfig
204+
205+
# Create S3Client with credentials
206+
s3_client = S3Client(
207+
access_key_id="your-access-key-id",
208+
secret_access_key="your-secret-access-key",
209+
region="us-east-1"
210+
)
211+
212+
# Fetch a file from S3
213+
file = s3_client.get_file_from_s3(
214+
bucket_name="your-bucket-name",
215+
key="path/to/your/file.txt"
216+
).save_file()
217+
218+
# Embed the file
219+
embedder = EmbeddingModel.from_pretrained_hf(
220+
model_id="jinaai/jina-embeddings-v2-small-en"
221+
)
222+
embeddings = embedder.embed_file(
223+
file,
224+
config=TextEmbedConfig(
225+
chunk_size=1000,
226+
batch_size=32,
227+
splitting_strategy="sentence"
228+
)
229+
)
230+
```
231+
232+
#### Using Environment Variables
233+
234+
```python
235+
from embed_anything import S3Client
236+
237+
# Create S3Client from environment variables
238+
# Reads from: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
239+
s3_client = S3Client.from_env()
240+
241+
# Fetch and use files as above
242+
file = s3_client.get_file_from_s3(
243+
bucket_name="your-bucket-name",
244+
key="path/to/your/file.pdf"
245+
).save_file()
246+
```
247+
248+
### S3Client Methods
249+
250+
- `get_file_from_s3(bucket_name, key)`: Fetches a file from S3 and returns an `S3File` object
251+
- `S3File.save_file(file_path)`: Saves the file to the local filesystem (optional path parameter)
252+
- `S3File.bytes`: Access file contents as bytes
253+
- `S3File.key`: Get the S3 key/path
254+
255+
For a complete example, see `examples/s3_example.py`.
256+
257+
---
258+
259+
We're excited about these improvements and look forward to seeing how you use them in your projects! For questions or feedback, please open an issue on our [GitHub repository](https://github.com/StarlightSearch/EmbedAnything).

0 commit comments

Comments
 (0)