ROS 2 stack for a Unitree A1 quadruped with an onboard Jetson Orin Nano providing:
- real-time human and pose detection (YOLO-pose, ONNX Runtime);
- gesture recognition from skeletons (ST-GCN);
- high-level locomotion commands for A1;
- the ability to run on the real robot or locally on a GPU PC for neural network debugging.
- An RGB camera (on Jetson / local machine) publishes a video stream.
f_human_detection2:- runs pose detection model (YOLO-pose in ONNX format);
- detects people and skeletal keypoints;
- publishes
PersonBodyArraymessages for downstream modules; - optionally publishes a debug image and RViz configuration.
f_gesture_recognition:- accumulates temporal sequences of skeletons;
- classifies gestures using ST-GCN;
- publishes
PersonActionmessages.
f_fox_command:- converts gestures and scene state into motion commands;
- generates
FoxCommandmessages (velocity, turn rate, modes, etc.).
f_a1_lcm_control:- receives
FoxCommand; - converts it to Unitree SDK format (via LCM +
unitree_legged_sdk); - sends low-level commands to A1.
- receives
unitree_ros2andunitree_hardwareprovide integration with the physical robot and simulation.
Main top-level directories:
-
f_human_detection2/— human pose detection:f_human_detection2/pose_node.py— main ROS 2 node;f_human_detection2/backends/— inference backends:yolo11_pose_onnxrt.py— ONNX Runtime + CUDA;yolo11_pose_trt.py— experimental TensorRT backend ( disabled by default);
config/dev.yaml,config/prod.yaml— configs for local and onboard modes;launch/dev.launch.py,launch/prod.launch.py,launch/dummy.launch.py— ready-to-use launch files;models/— ONNX models and auxiliary files;rviz/pose_debug.rviz— RViz config for debugging.
-
f_gesture_recognition/— gesture recognition:pose_classifier.py— ST-GCN inference on skeletons;models/— ST-GCN models (pretrained and custom);examples/people.yaml— example labels/config for gestures.
-
f_fox_command/— high-level command logic:command_node.py— ROS 2 node mapping gestures to A1 commands.
-
f_a1_lcm_control/— LCM bridge to Unitree A1:a1_lcm_control_node.py— bridge between ROS 2 messages and Unitree SDK.
-
f_bringup/— launching the full stack:launch/bringup.launch.py— perception + gesture + command + control;launch/tf.launch.py— TF tree configuration.
-
f_interfaces/— custom ROS 2 messages:msg/FoxCommand.msg— robot commands;msg/PersonBody.msg,msg/PersonBodyArray.msg— skeletons;msg/PersonAction.msg— recognized actions/gestures.
-
ml_training/— gesture model training:prepare_dataset.ipynb— dataset preparation from skeletons;stgcn_custom.py— ST-GCN fine-tuning.
-
unitree_legged_sdk/,unitree_ros2/— external Unitree code:- SDK, drivers, and ROS 2 integration for A1.
- Typically unchanged unless really needed.
-
util/— utility scripts:animate_skeleton.py— skeleton visualization;converse.py,lcm_setup.sh— service scripts.
-
Dockerfile,DockerfileJetson— base images for x86_64 development and Jetson builds. -
docker-compose.yaml— services for running the stack in Docker. -
requirements-docker.txt— Python dependencies for containers.
Recommended setup:
- Unitree A1 quadruped (controlled via Unitree SDK).
- Onboard computer:
- NVIDIA Jetson Orin Nano (or similar Jetson board) with CUDA-capable GPU;
- or a desktop/server with NVIDIA GPU (for local debugging).
- RGB camera:
- USB/CSI camera exposed as
/dev/videoX, or the robot’s onboard camera.
- USB/CSI camera exposed as
- Developer host machine (Linux, ideally Ubuntu 20.04/22.04).
- Docker ≥ 20.10.
- Docker Compose (v2,
docker compose). - NVIDIA GPU driver + CUDA (on host, optional).
- NVIDIA Container Toolkit (
nvidia-container-toolkit) for CUDA in Docker (optional). - Git with submodule support.
For optional non-Docker runs:
- Python 3.10+;
- ROS 2 (matching the
package.xmlrequirements); colcon+ standard ROS 2 build tools;- ONNX Runtime with CUDA support and (optionally) TensorRT Execution Provider.
git clone --recurse-submodules [email protected]:Innopolis-Robotics-Society/project_fabian.git
cd project_fabianIf it was cloned earlier without submodules:
git submodule update --init --recursive- Install NVIDIA driver and CUDA for your OS/GPU.
- Install NVIDIA Container Toolkit (see NVIDIA docs for details).
- Configure Docker to use the NVIDIA runtime and restart the Docker daemon.
-
Build and start the dev container:
# TODO: adjust service name to match docker-compose.yaml docker compose up --build <service>
> note: terminal service is for CPU running, terminal service is for CUDA running
-
Attach another shell to the running container:
docker compose exec <service> bash
-
Inside the container (once after code/dependency changes):
colcon build --symlink-install source install/setup.bash -
Run detection/gesture stack with a webcam:
# TODO: adjust launch arguments as needed ros2 launch f_human_detection2 dev.launch.py # in another terminal inside the container: ros2 run f_gesture_recognition pose_classifier
-
For model experiments, use
ml_training/:- open
prepare_dataset.ipynbin Jupyter; - run
stgcn_custom.pyfor fine-tuning.
- open
-
Build and start the dev container:
# TODO: adjust service name to match docker-compose.yaml docker compose up --build fox -
Attach another shell to the running container:
docker compose exec fox bash -
Inside the container:
colcon build --symlink-install source install/setup.bash -
Start Unitree low-level components (ROS 2 + SDK):
# TODO: fill with actual commands from unitree_ros2/unitree_hardware
Main configs:
-
f_human_detection2/config/dev.yaml— local mode:- camera selection (
/dev/videoXor ROS image topic); - input frame size, FPS;
- detection thresholds, NMS, etc.
- camera selection (
-
f_human_detection2/config/prod.yaml— onboard mode:- robot camera parameters;
- tuned thresholds for real-world usage.
-
f_gesture_recognition:- list of supported gestures;
- temporal window length (number of frames);
- path to ST-GCN model.
-
f_fox_command:- gesture → command mapping;
- velocity/turn limits;
- safety parameters.
TODO: add more.
Short-term:
- Finalize the Jetson Docker image (smaller, more robust runtime).
- Document all launch files and typical run scenarios.
Mid-term:
- Enable more optimized inference backends (FP16 / INT8, optionally TensorRT).
- Extend the gesture set and support multiple people in frame.
- Improve HRI scenarios (approach, follow, safe stop, etc.).
Long-term:
- Support additional platforms (Go1, Aliengo, etc.).
- Integrate with trajectory planning and mapping systems.