1Graz University of Technology
2University of Applied Sciences Technikum Wien
3University of Natural Resources and Life Sciences Vienna
OTAS is an open-vocabulary segmentation and semantic reconstruction model that aligns foundation model output tokens across one or multiple input views without any training. This results in a parameter-efficient and lightweight language embedding approach, especially effective for outdoor tasks. Ready to dive in?
Prerequisites
- 📂 Clone this repository:
git clone --recursive https://github.com/SimonSchwaiger/otas - 📦 Install dependencies:
pip install -r requirements.txt - ⬇️ Make sure you have wget installed and run the
download_checkpoints.shscript - 🎯 Import inference helper from
src/inference.py. Seedemo.ipynbfor usage!
📁 Repository Structure
- 🔧
src/inference.py- Your main inference helper functions (check outdemo.ipynbfor examples!) - 🧠
src/model.py- The core model with language embedding and reconstruction - ⚙️
src/model_config.py- Model configuration loader (you can create custom configs as JSON files and point to them withOTAS_CONFIG_PATH)
🎮 Inference Helper
We include a demo notebook that shows how to use the convenience wrappers for easy inference. You can even import from and export to Nerfstudio datasets for easy reconstruction! See the notebook at ./demo.ipynb.
With the repository downloaded, inference just takes a few lines of code by initialising OTAS from VGGT. Here is a reconstruction of a hiking trail in the alps.
The input images look something like this:
Using OTAS, a semantic map can easily be reconstructed, allowing for open-vocabulary queries! 🗺️
import open3d as o3d
import sys; sys.path.append("otas/src")
from inference import spatial_inference, all_imgs_in_dir
img_paths = all_imgs_in_dir("./img/outdoor_reconstruction")
model = spatial_inference(vggt_image_paths = img_paths)
## Geometry
pcd = model.cleanup_overexposed_pcd(model.pcd_colour)
o3d.io.write_point_cloud("./otas_reconstruction_geometric.ply", pcd)
## PCA
pcd_pca = model.visualise_pca_pcd()
o3d.io.write_point_cloud("./otas_reconstruction_pca.ply", pcd_pca)
## Open-Vocabulary Query
pcd = model.query_relevance_pcd(["wooden bridge"], ["object"])
o3d.io.write_point_cloud("./otas_reconstruction_similarity.ply", pcd)Here are the results! 🎉 The following images are a top-view over the geometric reconstruction (left), PCA over language embeddings (middle), and semantic similarity to the prompt "wooden bridge" (right):
The included ROS 2 node in src/ros2_node.py allows for real-time semantic mapping and simultaneous open-vocabulary queries. It requires RGB, depth and camera info topics to be published. Camera poses are tracked from TF.
- Install ROS 2 (tested with Humble)
- Set up OTAS in a virtual environment with dependencies installed and checkpoints downloaded (see Getting Started)
- Install ROS 2 dependencies:
apt update && apt install -y ros-$ROS_DISTRO-sensor-msgs-py ros-$ROS_DISTRO-message-filters ros-$ROS_DISTRO-cv-bridge. Depending on the provided message-filters and cv-bridge binaries, you may need to downgrade numpy:pip install "numpy<2" - Adjust configuration file (e.g.
src/model_config/OTAS_ros2.json) to your sensor setup. The following settings must match your ROS 2 environment:
ros_world_frame- World coordinate frame name (will be the common frame for all published pointclouds). If you run a SLAM alongside to track camera poses, make sure that TF updates from this (world) frame to the camera optical frame are time-synchronized with the camera streams.ros_camera_optical_frame- Camera optical frame. Make sure that (matching ROS 2 conventions) this frame follows the OpenCV convention of z-forward, y-down, x-right. Most camera drivers publish a static transform from the camera base frame to this optical frame.ros_rgb_topic- RGB image streamros_depth_topic- Depth image streamros_camera_info_topic- Camera info topic (use the depth camera's info here as published by the depth camera driver)
- Run the ROS 2 node:
python ros2_node.py
Published Topics
The ROS 2 node publishes the following topics. All are of type Pointcloud2 and in the world frame:
/otas/pointcloud/geometric- Geometric pointcloud/otas/pointcloud/pca- PCA over language embeddings/otas/pointcloud/similarity- Semantic similarity to a given prompt/otas/pointcloud/mask- Semantic segmentation mask/otas/pointcloud/cluster- Semantic clusters
Subscribed Topics
/otas/query- Query prompt to trigger publishing of similarity and mask pointclouds
Parameters
/otas/save_map_path- Path to save the map to disk. Set usingros2 param set /otas_frame_buffer otas_save_map_path "<path>"
Services
/otas/save_map- Save the current map to disk in Nerfstudio format. Saves by default to<otas_dir>/saved_maps/<timestamp>. Save location can be changed by setting the/otas/save_map_pathparameter. Call usingros2 service call otas/save_map std_srvs/srv/Trigger/otas/clear_map- Clear the mapped keyframes. Call usingros2 service call otas/clear_map std_srvs/srv/Trigger
If you use this work in your research, please cite our paper:
@misc{Schwaiger2025OTAS,
title = {OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation. \textit{arXiv preprint arXiv:2507.08851}},
author = {Simon Schwaiger and Stefan Thalhammer and Wilfried Wöber and Gerald Steinbauer-Wagner},
year = {2025},
url = {https://arxiv.org/abs/2507.08851}
}Thanks to these repositories and works! Without them, this research wouldn't have been possible:





