Project Fabian - "Robodog-human interaction with CV and deep learning"

ROS 2 stack for a Unitree A1 quadruped with an onboard Jetson Orin Nano providing:

real-time human and pose detection (YOLO-pose, ONNX Runtime);
gesture recognition from skeletons (ST-GCN);
high-level locomotion commands for A1;
the ability to run on the real robot or locally on a GPU PC for neural network debugging.

1. Overview and architecture

1.1. Dataflow

An RGB camera (on Jetson / local machine) publishes a video stream.
f_human_detection2:
- runs pose detection model (YOLO-pose in ONNX format);
- detects people and skeletal keypoints;
- publishes PersonBodyArray messages for downstream modules;
- optionally publishes a debug image and RViz configuration.
f_gesture_recognition:
- accumulates temporal sequences of skeletons;
- classifies gestures using ST-GCN;
- publishes PersonAction messages.
f_fox_command:
- converts gestures and scene state into motion commands;
- generates FoxCommand messages (velocity, turn rate, modes, etc.).
f_a1_lcm_control:
- receives FoxCommand;
- converts it to Unitree SDK format (via LCM + unitree_legged_sdk);
- sends low-level commands to A1.
unitree_ros2 and unitree_hardware provide integration with the physical robot and simulation.

2. Repository structure

Main top-level directories:

f_human_detection2/ — human pose detection:
- f_human_detection2/pose_node.py — main ROS 2 node;
- f_human_detection2/backends/ — inference backends:
  - yolo11_pose_onnxrt.py — ONNX Runtime + CUDA;
  - yolo11_pose_trt.py — experimental TensorRT backend ( disabled by default);
- config/dev.yaml, config/prod.yaml — configs for local and onboard modes;
- launch/dev.launch.py, launch/prod.launch.py, launch/dummy.launch.py — ready-to-use launch files;
- models/ — ONNX models and auxiliary files;
- rviz/pose_debug.rviz — RViz config for debugging.
f_gesture_recognition/ — gesture recognition:
- pose_classifier.py — ST-GCN inference on skeletons;
- models/ — ST-GCN models (pretrained and custom);
- examples/people.yaml — example labels/config for gestures.
f_fox_command/ — high-level command logic:
- command_node.py — ROS 2 node mapping gestures to A1 commands.
f_a1_lcm_control/ — LCM bridge to Unitree A1:
- a1_lcm_control_node.py — bridge between ROS 2 messages and Unitree SDK.
f_bringup/ — launching the full stack:
- launch/bringup.launch.py — perception + gesture + command + control;
- launch/tf.launch.py — TF tree configuration.
f_interfaces/ — custom ROS 2 messages:
- msg/FoxCommand.msg — robot commands;
- msg/PersonBody.msg, msg/PersonBodyArray.msg — skeletons;
- msg/PersonAction.msg — recognized actions/gestures.
ml_training/ — gesture model training:
- prepare_dataset.ipynb — dataset preparation from skeletons;
- stgcn_custom.py — ST-GCN fine-tuning.
unitree_legged_sdk/, unitree_ros2/ — external Unitree code:
- SDK, drivers, and ROS 2 integration for A1.
- Typically unchanged unless really needed.
util/ — utility scripts:
- animate_skeleton.py — skeleton visualization;
- converse.py, lcm_setup.sh — service scripts.
Dockerfile, DockerfileJetson — base images for x86_64 development and Jetson builds.
docker-compose.yaml — services for running the stack in Docker.
requirements-docker.txt — Python dependencies for containers.

3. Hardware and software requirements

3.1. Hardware

Recommended setup:

Unitree A1 quadruped (controlled via Unitree SDK).
Onboard computer:
- NVIDIA Jetson Orin Nano (or similar Jetson board) with CUDA-capable GPU;
- or a desktop/server with NVIDIA GPU (for local debugging).
RGB camera:
- USB/CSI camera exposed as /dev/videoX, or the robot’s onboard camera.
Developer host machine (Linux, ideally Ubuntu 20.04/22.04).

3.2. Software (host)

Docker ≥ 20.10.
Docker Compose (v2, docker compose).
NVIDIA GPU driver + CUDA (on host, optional).
NVIDIA Container Toolkit (nvidia-container-toolkit) for CUDA in Docker (optional).
Git with submodule support.

For optional non-Docker runs:

Python 3.10+;
ROS 2 (matching the package.xml requirements);
colcon + standard ROS 2 build tools;
ONNX Runtime with CUDA support and (optionally) TensorRT Execution Provider.

4. Installation

4.1. Clone the repository

git clone --recurse-submodules [email protected]:Innopolis-Robotics-Society/project_fabian.git
cd project_fabian

If it was cloned earlier without submodules:

git submodule update --init --recursive

4.2. Prepare GPU Docker (once per host)

Install NVIDIA driver and CUDA for your OS/GPU.
Install NVIDIA Container Toolkit (see NVIDIA docs for details).
Configure Docker to use the NVIDIA runtime and restart the Docker daemon.

5. Quick start (Docker)

5.1. Local development (GPU PC, no robot)

Build and start the dev container:

# TODO: adjust service name to match docker-compose.yaml
docker compose up --build <service>

> note: terminal service is for CPU running, terminal service is for CUDA running

Attach another shell to the running container:
```
docker compose exec <service> bash
```

Inside the container (once after code/dependency changes):

colcon build --symlink-install
source install/setup.bash

Run detection/gesture stack with a webcam:

# TODO: adjust launch arguments as needed
ros2 launch f_human_detection2 dev.launch.py

# in another terminal inside the container:
ros2 run f_gesture_recognition pose_classifier

For model experiments, use ml_training/:
- open prepare_dataset.ipynb in Jupyter;
- run stgcn_custom.py for fine-tuning.

5.2. Running on the robot (Jetson + A1)

Build and start the dev container:

# TODO: adjust service name to match docker-compose.yaml
docker compose up --build fox

Attach another shell to the running container:
```
docker compose exec fox bash
```

Inside the container:

colcon build --symlink-install
source install/setup.bash

Start Unitree low-level components (ROS 2 + SDK):

# TODO: fill with actual commands from unitree_ros2/unitree_hardware

6. Configuration

Main configs:

f_human_detection2/config/dev.yaml — local mode:
- camera selection (/dev/videoX or ROS image topic);
- input frame size, FPS;
- detection thresholds, NMS, etc.
f_human_detection2/config/prod.yaml — onboard mode:
- robot camera parameters;
- tuned thresholds for real-world usage.
f_gesture_recognition:
- list of supported gestures;
- temporal window length (number of frames);
- path to ST-GCN model.
f_fox_command:
- gesture → command mapping;
- velocity/turn limits;
- safety parameters.

TODO: add more.

8. Roadmap (example)

Short-term:

Finalize the Jetson Docker image (smaller, more robust runtime).
Document all launch files and typical run scenarios.

Mid-term:

Enable more optimized inference backends (FP16 / INT8, optionally TensorRT).
Extend the gesture set and support multiple people in frame.
Improve HRI scenarios (approach, follow, safe stop, etc.).

Long-term:

Support additional platforms (Go1, Aliengo, etc.).
Integrate with trajectory planning and mapping systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Fabian - "Robodog-human interaction with CV and deep learning"

1. Overview and architecture

1.1. Dataflow

2. Repository structure

3. Hardware and software requirements

3.1. Hardware

3.2. Software (host)

4. Installation

4.1. Clone the repository

4.2. Prepare GPU Docker (once per host)

5. Quick start (Docker)

5.1. Local development (GPU PC, no robot)

5.2. Running on the robot (Jetson + A1)

6. Configuration

8. Roadmap (example)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.dvc		.dvc
assets		assets
f_a1_lcm_control		f_a1_lcm_control
f_bringup		f_bringup
f_fox_command		f_fox_command
f_gesture_recognition		f_gesture_recognition
f_human_detection2		f_human_detection2
f_interfaces		f_interfaces
ml_training		ml_training
unitree_legged_sdk @ 20929b5		unitree_legged_sdk @ 20929b5
unitree_ros2		unitree_ros2
util		util
.dvcignore		.dvcignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
DockerfileJetson		DockerfileJetson
README.md		README.md
Robofox_human_interaction_with_CV_and_deep_learning__Team_1_.pdf		Robofox_human_interaction_with_CV_and_deep_learning__Team_1_.pdf
docker-compose.yaml		docker-compose.yaml
requirements-docker.txt		requirements-docker.txt

Folders and files

Latest commit

History

Repository files navigation

Project Fabian - "Robodog-human interaction with CV and deep learning"

1. Overview and architecture

1.1. Dataflow

2. Repository structure

3. Hardware and software requirements

3.1. Hardware

3.2. Software (host)

4. Installation

4.1. Clone the repository

4.2. Prepare GPU Docker (once per host)

5. Quick start (Docker)

5.1. Local development (GPU PC, no robot)

5.2. Running on the robot (Jetson + A1)

6. Configuration

8. Roadmap (example)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages