A low-latency, multimodal gesture-recognition system using MediaPipe, enabling hand and full-body pose gestures to drive real-time sound generation. This project explores music–movement–dance interactions and serves as a technical demo aligned.
GestureCap Demo is a real-time interactive system that translates human movement into sound using a webcam—without markers, wearables, or sensors.
The system supports:
- ✋ Hand gestures for fine motor control
- 🧍 Full-body pose gestures for expressive, large-scale interaction
- ⚡ Low-latency audio feedback (<35 ms)
- 📊 Quantitative performance logging for research evaluation
This demo acts as a foundation for artistic, neuroscientific, and therapeutic applications involving embodied interaction.
- Open palm
- Fist
- Rule-based finger counting using MediaPipe Hands
- left_arm_up
- right_arm_up
- both_arms_up Detected using MediaPipe Pose landmarks.
- Gesture-triggered sound generation
- Non-blocking, threaded audio playback
- Distinct pitch mapping per gesture
- FPS: ~28–35
- End-to-end latency: ~25–35 ms
- Optimized MediaPipe configuration (model_complexity=0)
- Gesture debouncing with cooldown
- Multimodal gesture priority handling
- Per-gesture latency measurement
- Automatic CSV logging (latency_log.csv)
- Console-based real-time feedback
Webcam
↓
MediaPipe (Hands + Pose)
↓
Gesture Detection Logic
↓
Gesture State / Debounce
↓
Audio Engine (Non-blocking)
↓
Latency Measurement + Logging
The architecture is modular, making it easy to extend with:
- MIDI / OSC output
- Machine-learning gesture classifiers
- Continuous control (pitch / volume modulation)
gesturecap-demo/
│
├── src/
│ ├── capture/ # Camera & FPS handling
│ ├── vision/ # Hand & pose tracking
│ ├── gestures/ # Gesture logic & state
│ ├── audio/ # Sound & TTS engines
│ ├── evaluation/ # Latency tracking & logging
│ └── main.py # Application entry point
│
├── config/ # Gesture & audio mappings
├── latency_log.csv # Generated performance logs
├── requirements.txt
└── README.md
git clone https://github.com/<your-username>/gesturecap-demo.git
cd gesturecap-demo
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate
pip install -r requirements.txt
python src/main.py
- Show hand or raise arms in front of the camera
- Press ESC to exit
| Gesture | Interaction |
|---|---|
| Open Palm | Mid-frequency tone |
| Fist | Low-frequency tone |
| Left Arm Up | Bass tone |
| Right Arm Up | High-frequency tone |
| Both Arms Up | Strong accent tone |
- Latency is measured per gesture event
- Results are logged to latency_log.csv
- Typical results on CPU:
- Latency: 25–35 ms
FPS: 28–35 This makes the system suitable for real-time musical interaction.
- 🎵 Gesture-driven musical instruments
- 💃 Dance-controlled sound generation
- 🧠 Research on embodied cognition & agency
- 🧑⚕️ Therapeutic & rehabilitative interaction systems
MIT License
Electronics & Telecommunication Engineering Open-source & GSoC aspirant