Skip to content

Commit 658293d

Browse files
Merge pull request #201 from StarlightSearch/dev
Video embedding as a feature!!
2 parents 4c1db2a + 9f07621 commit 658293d

23 files changed

Lines changed: 1219 additions & 27 deletions

File tree

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
9191
- **AWS S3 Bucket:** : Directly import AWS S3 bucket files.
9292
- **Prebult Docker Image** : Just pull it: starlightsearch/embedanything-server
9393
- **SearchAgent** : Example of how you can use index for Searchr1 reasoning.
94+
- **Video guide** : Quick start for frame sampling: https://embed-anything.com/guides/video/
9495

9596
## 💡What is Vector Streaming
9697

@@ -478,7 +479,7 @@ We’re excited to share that we've expanded our platform to support multiple mo
478479

479480
- [x] Images
480481

481-
- [ ] Videos
482+
- [x] Videos (frame sampling; enable the `video` feature)
482483

483484
- [ ] Graph
484485

@@ -498,7 +499,7 @@ We now support both candle and Onnx backend<br/>
498499
We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.
499500

500501
➡️ Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
501-
➡️ Video Embedding <br/>
502+
➡️ Video embedding improvements (temporal + audio) <br/>
502503
➡️ Yolo Clip <br/>
503504

504505

docs/guides/video.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Video Embeddings (Frame Sampling)
2+
3+
EmbedAnything supports video by sampling frames and embedding them with a vision model
4+
(CLIP/SigLIP). This is opt-in via the `video` feature flag and requires the `ffmpeg`
5+
CLI to be available on your system. If `ffmpeg` is not on `PATH`, set `FFMPEG_BIN`
6+
to the full path of the executable.
7+
8+
## Recommended Config
9+
10+
`VideoEmbedConfig` controls how frames are sampled:
11+
12+
- `frame_step`: sample every Nth frame. Default `30`.
13+
- `max_frames`: maximum frames per video. Default `300`.
14+
- `batch_size`: frames per embedding batch. Default `32`.
15+
16+
Suggested starting point:
17+
18+
```python
19+
from embed_anything import VideoEmbedConfig
20+
21+
config = VideoEmbedConfig(frame_step=30, max_frames=300, batch_size=16)
22+
```
23+
24+
## Python Usage
25+
26+
```python
27+
import embed_anything
28+
from embed_anything import VideoEmbedConfig
29+
30+
model = embed_anything.EmbeddingModel.from_pretrained_hf(
31+
model_id="openai/clip-vit-base-patch16"
32+
)
33+
34+
config = VideoEmbedConfig(frame_step=30, max_frames=200, batch_size=16)
35+
36+
data = embed_anything.embed_video_file("path/to/video.mp4", embedder=model, config=config)
37+
```
38+
39+
## Build with Video Support
40+
41+
You must enable the `video` feature and have the `ffmpeg` CLI installed.
42+
43+
### macOS
44+
45+
```bash
46+
brew install ffmpeg
47+
cargo build --features video
48+
# Python (maturin)
49+
maturin develop --features "extension-module,video"
50+
```
51+
52+
### Linux (Debian/Ubuntu)
53+
54+
```bash
55+
sudo apt-get update
56+
sudo apt-get install -y ffmpeg
57+
cargo build --features video
58+
# Python (maturin)
59+
maturin develop --features "extension-module,video"
60+
```
61+
62+
### Windows (prebuilt FFmpeg)
63+
64+
```powershell
65+
1. Download a static build from https://www.gyan.dev/ffmpeg/builds/
66+
2. Extract it and set:
67+
68+
```powershell
69+
$env:FFMPEG_BIN = "C:\path\to\ffmpeg.exe"
70+
```
71+
72+
Then build:
73+
74+
```powershell
75+
cargo build --features video
76+
# Python (maturin)
77+
maturin develop --features "extension-module,video"
78+
```
79+
```
80+
81+
## Output Metadata
82+
83+
Each embedding includes:
84+
85+
- `video_path`: the source video file
86+
- `frame_index`: the sampled frame index (0-based)

docs/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ EmbedAnything is a minimalist, yet highly performant, modular, lightning-fast, l
7474
- **Candle Backend** : Supports BERT, Jina, ColPali, Splade, ModernBERT, Reranker, Qwen
7575
- **ONNX Backend**: Supports BERT, Jina, ColPali, ColBERT Splade, Reranker, ModernBERT, Qwen
7676
- **Cloud Embedding Models:**: Supports OpenAI, Cohere, and Gemini.
77-
- **MultiModality** : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
77+
- **MultiModality** : Works with text sources like PDFs, txt, md, images, audio (.WAV), and videos (frame sampling; enable the `video` feature)
7878
- **GPU support** : Hardware acceleration on GPU as well.
7979
- **Chunking** : In-built chunking methods like semantic, late-chunking
8080
- **Vector Streaming:** Separate file processing, Indexing and Inferencing on different threads, reduces latency.
@@ -339,7 +339,7 @@ We’re excited to share that we've expanded our platform to support multiple mo
339339

340340
- [x] Images
341341

342-
- [ ] Videos
342+
- [x] Videos (frame sampling; enable the `video` feature)
343343

344344
- [ ] Graph
345345

@@ -359,7 +359,7 @@ We now support both candle and Onnx backend<br/>
359359
We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.
360360

361361
➡️ Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
362-
➡️ Video Embedding <br/>
362+
➡️ Video embedding improvements (temporal + audio) <br/>
363363
➡️ Yolo Clip <br/>
364364

365365

docs/roadmap/roadmap.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ We’re excited to share that we've expanded our platform to support multiple mo
1717

1818
- [x] Images
1919

20-
- [ ] Videos
20+
- [x] Videos (frame sampling; enable the `video` feature)
2121

2222
- [ ] Graph
2323

@@ -58,7 +58,7 @@ To address this, we’re excited to announce that we’re introducing Candle-ONN
5858
We had multimodality from day one for our infrastructure. We have already included it for websites, images and audios but we want to expand it further to.
5959

6060
☑️Graph embedding -- build deepwalks embeddings depth first and word to vec <br />
61-
☑️Video Embedding <br/>
61+
☑️Video embedding improvements (temporal + audio) <br/>
6262
☑️ Yolo Clip <br/>
6363

6464

examples/video.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import os
2+
from pathlib import Path
3+
4+
import embed_anything
5+
from embed_anything import EmbedData, VideoEmbedConfig
6+
7+
# Load a vision model (CLIP/SigLIP) for frame embeddings
8+
model = embed_anything.EmbeddingModel.from_pretrained_hf(
9+
model_id="openai/clip-vit-base-patch16"
10+
)
11+
12+
# Sample every 30th frame (~1 fps for 30 fps videos), cap to 200 frames
13+
config = VideoEmbedConfig(frame_step=30, max_frames=200, batch_size=16)
14+
15+
video_path = os.environ.get("VIDEO_PATH", "path/to/video.mp4")
16+
if not Path(video_path).exists():
17+
raise FileNotFoundError(
18+
f"Video not found: {video_path}. Set VIDEO_PATH env var to a valid file."
19+
)
20+
21+
# Embed a single video
22+
video_embeddings: list[EmbedData] = embed_anything.embed_video_file(
23+
video_path,
24+
embedder=model,
25+
config=config,
26+
)
27+
print(f"Embedded {len(video_embeddings)} frames from video.")
28+
29+
video_dir = os.environ.get("VIDEO_DIR")
30+
if video_dir:
31+
dir_embeddings = embed_anything.embed_video_directory(
32+
video_dir,
33+
embedder=model,
34+
config=config,
35+
)
36+
if dir_embeddings is not None:
37+
print(f"Embedded {len(dir_embeddings)} total frames from directory.")

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ nav:
5252
- Guides:
5353
- guides/colpali.md
5454
- guides/images.md
55+
- guides/video.md
5556
- guides/semantic.md
5657
- guides/adapters.md
5758
- guides/onnx_models.md

processors/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,11 @@ pdf2image = "0.1.3"
3030
image = "0.25.6"
3131
thiserror = "2.0.12"
3232
tempfile = "3.19.1"
33+
# Video processing (uses external ffmpeg CLI)
3334

3435
[dev-dependencies]
3536
tempdir = "0.3.7"
3637

3738
[features]
3839
default = []
40+
video = []

processors/src/lib.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,7 @@ pub mod html_processor;
1515

1616
/// This module contains the file processor for DOCX files.
1717
pub mod docx_processor;
18+
19+
/// This module contains the file processor for video files.
20+
#[cfg(feature = "video")]
21+
pub mod video_processor;

processors/src/video_processor.rs

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
#![cfg(feature = "video")]
2+
3+
use anyhow::{anyhow, Result};
4+
use std::env;
5+
use std::path::{Path, PathBuf};
6+
use std::process::Command;
7+
use std::{fs, path};
8+
9+
#[derive(Debug, Clone, Copy)]
10+
pub enum VideoFrameFormat {
11+
Jpeg,
12+
Png,
13+
}
14+
15+
impl VideoFrameFormat {
16+
fn extension(self) -> &'static str {
17+
match self {
18+
VideoFrameFormat::Jpeg => "jpg",
19+
VideoFrameFormat::Png => "png",
20+
}
21+
}
22+
}
23+
24+
#[derive(Debug, Clone)]
25+
pub struct VideoFrame {
26+
pub index: usize,
27+
pub path: PathBuf,
28+
}
29+
30+
#[derive(Debug, Clone)]
31+
pub struct VideoProcessor {
32+
frame_step: usize,
33+
max_frames: Option<usize>,
34+
output_format: VideoFrameFormat,
35+
ffmpeg_bin: Option<PathBuf>,
36+
}
37+
38+
impl VideoProcessor {
39+
pub fn new(frame_step: usize) -> Self {
40+
Self {
41+
frame_step: frame_step.max(1),
42+
max_frames: None,
43+
output_format: VideoFrameFormat::Jpeg,
44+
ffmpeg_bin: None,
45+
}
46+
}
47+
48+
pub fn with_max_frames(mut self, max_frames: usize) -> Self {
49+
self.max_frames = Some(max_frames);
50+
self
51+
}
52+
53+
pub fn with_output_format(mut self, output_format: VideoFrameFormat) -> Self {
54+
self.output_format = output_format;
55+
self
56+
}
57+
58+
pub fn with_ffmpeg_bin<P: AsRef<Path>>(mut self, ffmpeg_bin: P) -> Self {
59+
self.ffmpeg_bin = Some(ffmpeg_bin.as_ref().to_path_buf());
60+
self
61+
}
62+
63+
fn resolve_ffmpeg_bin(&self) -> Result<PathBuf> {
64+
if let Some(bin) = &self.ffmpeg_bin {
65+
return Ok(bin.clone());
66+
}
67+
if let Ok(bin) = env::var("FFMPEG_BIN") {
68+
return Ok(PathBuf::from(bin));
69+
}
70+
Ok(PathBuf::from("ffmpeg"))
71+
}
72+
73+
pub fn extract_frames_to_dir<P: AsRef<Path>, Q: AsRef<Path>>(
74+
&self,
75+
video_path: P,
76+
output_dir: Q,
77+
) -> Result<Vec<VideoFrame>> {
78+
let output_dir = output_dir.as_ref();
79+
fs::create_dir_all(output_dir)?;
80+
81+
let ffmpeg_bin = self.resolve_ffmpeg_bin()?;
82+
let frame_step = self.frame_step.max(1);
83+
let filter = format!("select=not(mod(n\\,{}))", frame_step);
84+
let output_pattern = output_dir.join(format!(
85+
"frame_%06d.{}",
86+
self.output_format.extension()
87+
));
88+
89+
let mut command = Command::new(ffmpeg_bin);
90+
command
91+
.arg("-hide_banner")
92+
.arg("-loglevel")
93+
.arg("error")
94+
.arg("-i")
95+
.arg(video_path.as_ref())
96+
.arg("-vf")
97+
.arg(filter)
98+
.arg("-vsync")
99+
.arg("vfr");
100+
101+
if let Some(max_frames) = self.max_frames {
102+
command.arg("-vframes").arg(max_frames.to_string());
103+
}
104+
105+
let status = command.arg(output_pattern).status()?;
106+
if !status.success() {
107+
return Err(anyhow!("ffmpeg failed with exit code {:?}", status.code()));
108+
}
109+
110+
let mut frame_paths = fs::read_dir(output_dir)?
111+
.filter_map(|entry| entry.ok())
112+
.filter(|entry| entry.file_type().map(|t| t.is_file()).unwrap_or(false))
113+
.map(|entry| entry.path())
114+
.filter(|path| {
115+
path.extension()
116+
.and_then(|ext| ext.to_str())
117+
.map(|ext| ext.eq_ignore_ascii_case(self.output_format.extension()))
118+
.unwrap_or(false)
119+
})
120+
.collect::<Vec<path::PathBuf>>();
121+
122+
frame_paths.sort();
123+
124+
if frame_paths.is_empty() {
125+
return Err(anyhow!("No frames extracted from video"));
126+
}
127+
128+
let frames = frame_paths
129+
.into_iter()
130+
.enumerate()
131+
.map(|(index, path)| VideoFrame { index, path })
132+
.collect();
133+
134+
Ok(frames)
135+
}
136+
137+
pub fn extract_frames_to_temp_dir<P: AsRef<Path>>(
138+
&self,
139+
video_path: P,
140+
) -> Result<(tempfile::TempDir, Vec<VideoFrame>)> {
141+
let temp_dir = tempfile::TempDir::new()?;
142+
let frames = self.extract_frames_to_dir(video_path, temp_dir.path())?;
143+
Ok((temp_dir, frames))
144+
}
145+
}

0 commit comments

Comments
 (0)