OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation

Simon Schwaiger^1,2, Stefan Thalhammer², Wilfried Wöber^2,3 and Gerald Steinbauer-Wagner¹

¹Graz University of Technology
²University of Applied Sciences Technikum Wien
³University of Natural Resources and Life Sciences Vienna

🌐 otas-segmentation.github.io

🚀 Getting Started

OTAS is an open-vocabulary segmentation and semantic reconstruction model that aligns foundation model output tokens across one or multiple input views without any training. This results in a parameter-efficient and lightweight language embedding approach, especially effective for outdoor tasks. Ready to dive in?

Prerequisites

📂 Clone this repository: git clone --recursive https://github.com/SimonSchwaiger/otas
📦 Install dependencies: pip install -r requirements.txt
⬇️ Make sure you have wget installed and run the download_checkpoints.sh script
🎯 Import inference helper from src/inference.py. See demo.ipynb for usage!

📁 Repository Structure

🔧 src/inference.py - Your main inference helper functions (check out demo.ipynb for examples!)
🧠 src/model.py - The core model with language embedding and reconstruction
⚙️ src/model_config.py - Model configuration loader (you can create custom configs as JSON files and point to them with OTAS_CONFIG_PATH)

🎮 Inference Helper

We include a demo notebook that shows how to use the convenience wrappers for easy inference. You can even import from and export to Nerfstudio datasets for easy reconstruction! See the notebook at ./demo.ipynb.

🏔️ Few-Line Environment Reconstruction Example

With the repository downloaded, inference just takes a few lines of code by initialising OTAS from VGGT. Here is a reconstruction of a hiking trail in the alps.

The input images look something like this:

Using OTAS, a semantic map can easily be reconstructed, allowing for open-vocabulary queries! 🗺️

import open3d as o3d
import sys; sys.path.append("otas/src")
from inference import spatial_inference, all_imgs_in_dir

img_paths = all_imgs_in_dir("./img/outdoor_reconstruction")
model = spatial_inference(vggt_image_paths = img_paths)

## Geometry
pcd = model.cleanup_overexposed_pcd(model.pcd_colour)
o3d.io.write_point_cloud("./otas_reconstruction_geometric.ply", pcd)

## PCA
pcd_pca = model.visualise_pca_pcd()
o3d.io.write_point_cloud("./otas_reconstruction_pca.ply", pcd_pca)

## Open-Vocabulary Query
pcd = model.query_relevance_pcd(["wooden bridge"], ["object"])
o3d.io.write_point_cloud("./otas_reconstruction_similarity.ply", pcd)

Here are the results! 🎉 The following images are a top-view over the geometric reconstruction (left), PCA over language embeddings (middle), and semantic similarity to the prompt "wooden bridge" (right):

🤖 Real-Time ROS 2 Integration

The included ROS 2 node in src/ros2_node.py allows for real-time semantic mapping and simultaneous open-vocabulary queries. It requires RGB, depth and camera info topics to be published. Camera poses are tracked from TF.

Install ROS 2 (tested with Humble)
Set up OTAS in a virtual environment with dependencies installed and checkpoints downloaded (see Getting Started)
Install ROS 2 dependencies: apt update && apt install -y ros-$ROS_DISTRO-sensor-msgs-py ros-$ROS_DISTRO-message-filters ros-$ROS_DISTRO-cv-bridge. Depending on the provided message-filters and cv-bridge binaries, you may need to downgrade numpy: pip install "numpy<2"
Adjust configuration file (e.g. src/model_config/OTAS_ros2.json) to your sensor setup. The following settings must match your ROS 2 environment:

ros_world_frame - World coordinate frame name (will be the common frame for all published pointclouds). If you run a SLAM alongside to track camera poses, make sure that TF updates from this (world) frame to the camera optical frame are time-synchronized with the camera streams.
ros_camera_optical_frame - Camera optical frame. Make sure that (matching ROS 2 conventions) this frame follows the OpenCV convention of z-forward, y-down, x-right. Most camera drivers publish a static transform from the camera base frame to this optical frame.
ros_rgb_topic - RGB image stream
ros_depth_topic - Depth image stream
ros_camera_info_topic - Camera info topic (use the depth camera's info here as published by the depth camera driver)

Run the ROS 2 node: python ros2_node.py

Published Topics

The ROS 2 node publishes the following topics. All are of type Pointcloud2 and in the world frame:

/otas/pointcloud/geometric - Geometric pointcloud
/otas/pointcloud/pca - PCA over language embeddings
/otas/pointcloud/similarity - Semantic similarity to a given prompt
/otas/pointcloud/mask - Semantic segmentation mask
/otas/pointcloud/cluster - Semantic clusters

Subscribed Topics

/otas/query - Query prompt to trigger publishing of similarity and mask pointclouds

Parameters

/otas/save_map_path - Path to save the map to disk. Set using ros2 param set /otas_frame_buffer otas_save_map_path "<path>"

Services

/otas/save_map - Save the current map to disk in Nerfstudio format. Saves by default to <otas_dir>/saved_maps/<timestamp>. Save location can be changed by setting the /otas/save_map_path parameter. Call using ros2 service call otas/save_map std_srvs/srv/Trigger
/otas/clear_map - Clear the mapped keyframes. Call using ros2 service call otas/clear_map std_srvs/srv/Trigger

📄 Citation

If you use this work in your research, please cite our paper:

@misc{Schwaiger2025OTAS,
    title               = {OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation. \textit{arXiv preprint arXiv:2507.08851}}, 
    author              = {Simon Schwaiger and Stefan Thalhammer and Wilfried Wöber and Gerald Steinbauer-Wagner},
    year                = {2025},
    url                 = {https://arxiv.org/abs/2507.08851}
}

🙏 Acknowledgement

Thanks to these repositories and works! Without them, this research wouldn't have been possible:

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
img		img
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
download_checkpoints.sh		download_checkpoints.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation

Simon Schwaiger^1,2, Stefan Thalhammer², Wilfried Wöber^2,3 and Gerald Steinbauer-Wagner¹

🚀 Getting Started

🏔️ Few-Line Environment Reconstruction Example

🤖 Real-Time ROS 2 Integration

📄 Citation

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

SimonSchwaiger/otas

Folders and files

Latest commit

History

Repository files navigation

OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation

Simon Schwaiger1,2, Stefan Thalhammer2, Wilfried Wöber2,3 and Gerald Steinbauer-Wagner1

🚀 Getting Started

🏔️ Few-Line Environment Reconstruction Example

🤖 Real-Time ROS 2 Integration

📄 Citation

🙏 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Simon Schwaiger^1,2, Stefan Thalhammer², Wilfried Wöber^2,3 and Gerald Steinbauer-Wagner¹

Packages