Skip to content

Pranav-0440/gesturecap-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎶 GestureCap Demo

Real-time markerless gesture & pose-based music interaction

A low-latency, multimodal gesture-recognition system using MediaPipe, enabling hand and full-body pose gestures to drive real-time sound generation. This project explores music–movement–dance interactions and serves as a technical demo aligned.

🚀 Overview

GestureCap Demo is a real-time interactive system that translates human movement into sound using a webcam—without markers, wearables, or sensors.

The system supports:

  • ✋ Hand gestures for fine motor control
  • 🧍 Full-body pose gestures for expressive, large-scale interaction
  • ⚡ Low-latency audio feedback (<35 ms)
  • 📊 Quantitative performance logging for research evaluation

This demo acts as a foundation for artistic, neuroscientific, and therapeutic applications involving embodied interaction.

✨ Key Features

✋ Hand-Based Gestures

  • Open palm
  • Fist
  • Rule-based finger counting using MediaPipe Hands

🧍 Pose-Based Gestures

  • left_arm_up
  • right_arm_up
  • both_arms_up Detected using MediaPipe Pose landmarks.

🔊 Sound Interaction

  • Gesture-triggered sound generation
  • Non-blocking, threaded audio playback
  • Distinct pitch mapping per gesture

⚡ Real-Time Performance

  • FPS: ~28–35
  • End-to-end latency: ~25–35 ms
  • Optimized MediaPipe configuration (model_complexity=0)

🧠 Interaction Stability

  • Gesture debouncing with cooldown
  • Multimodal gesture priority handling

📊 Evaluation & Logging

  • Per-gesture latency measurement
  • Automatic CSV logging (latency_log.csv)
  • Console-based real-time feedback

🧠 System Architecture

Webcam
  ↓
MediaPipe (Hands + Pose)
  ↓
Gesture Detection Logic
  ↓
Gesture State / Debounce
  ↓
Audio Engine (Non-blocking)
  ↓
Latency Measurement + Logging

The architecture is modular, making it easy to extend with:

  • MIDI / OSC output
  • Machine-learning gesture classifiers
  • Continuous control (pitch / volume modulation)

📁 Project Structure

gesturecap-demo/
│
├── src/
│   ├── capture/        # Camera & FPS handling
│   ├── vision/         # Hand & pose tracking
│   ├── gestures/       # Gesture logic & state
│   ├── audio/          # Sound & TTS engines
│   ├── evaluation/     # Latency tracking & logging
│   └── main.py         # Application entry point
│
├── config/             # Gesture & audio mappings
├── latency_log.csv     # Generated performance logs
├── requirements.txt
└── README.md

🛠️ Installation

1️⃣ Clone the repository

git clone https://github.com/<your-username>/gesturecap-demo.git
cd gesturecap-demo

2️⃣ Create and activate a virtual environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate

3️⃣ Install dependencies

pip install -r requirements.txt

▶️ Running the Demo

python src/main.py

Controls

  • Show hand or raise arms in front of the camera
  • Press ESC to exit

🎮 Gesture Mapping (Current)

Gesture Interaction
Open Palm Mid-frequency tone
Fist Low-frequency tone
Left Arm Up Bass tone
Right Arm Up High-frequency tone
Both Arms Up Strong accent tone

📊 Performance & Evaluation

  • Latency is measured per gesture event
  • Results are logged to latency_log.csv
    • Typical results on CPU:
    • Latency: 25–35 ms

FPS: 28–35 This makes the system suitable for real-time musical interaction.

🎯 Motivation & Use Cases

  • 🎵 Gesture-driven musical instruments
  • 💃 Dance-controlled sound generation
  • 🧠 Research on embodied cognition & agency
  • 🧑‍⚕️ Therapeutic & rehabilitative interaction systems

📜 License

MIT License

🙌 Author

Pranav Ghorpade

Electronics & Telecommunication Engineering Open-source & GSoC aspirant

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages