Robust. Hardware-Agnostic. Advanced.
A professional implementation of monocular depth estimation for video, optimized for Google Colab stability and performance.
- Dual Modes: Standard Relative Depth (Visuals) & Metric Depth (Measurements)
- Hardware Smart: Automatically uses GPU (FP16) for speed or CPU (FP32) for compatibility
- 3D Snapshots: Export high-quality 3D Point Clouds (.ply) from any frame
- Robust Engine: Flicker reduction, high-quality FFmpeg encoding, and memory safety
- Flexible Input: Upload videos directly or download from YouTube/URLs
- Click the "Open in Colab" badge above
- Run Cell 1 to install dependencies
- Run Cell 2 to initialize the depth engine
- Configure your settings in Cell 3 and run the dashboard
| Type | Description |
|---|---|
Relative |
Best for visual depth maps and artistic effects |
Metric |
Outputs actual depth measurements for 3D export |
| Size | Description |
|---|---|
small |
Fastest, lower memory usage |
base |
Balanced performance |
large |
Best quality, higher memory usage |
| Option | Description |
|---|---|
Native |
Original video resolution |
720p |
HD resolution |
480p |
Standard definition (recommended for T4 GPUs) |
360p |
Fastest processing |
Generate 3D point clouds (.ply files) from any frame in your video:
- Set
GENERATE_SNAPSHOT = True - Set
SNAPSHOT_TIMEto the desired timestamp (in seconds) - The .ply file will be automatically downloaded after processing
Point clouds can be viewed in software like MeshLab, Blender, or CloudCompare.
| File | Description |
|---|---|
depth_launcher.ipynb |
Main Colab notebook with all-in-one setup |
The notebook automatically installs all required packages (PyTorch is pre-installed in Colab):
transformers(from source) - Hugging Face model loadingaccelerate- Optimized inferenceopencv-python- Video processingyt-dlp- YouTube/URL video downloadingpillow- Image processingnumpy- Numerical computations
- Use
480presolution for T4 GPUs to avoid memory issues - The
smallmodel provides good results with faster processing - Enable temporal smoothing (default: 3 frames) for flicker-free output
- Metric mode is recommended when exporting 3D point clouds
This project uses the Depth Anything V2 model from Hugging Face Transformers.