Skip to content

Real Time Demo that allows natural conversations#91

Open
freddyaboulton wants to merge 1 commit intoQwenLM:mainfrom
freddyaboulton:main
Open

Real Time Demo that allows natural conversations#91
freddyaboulton wants to merge 1 commit intoQwenLM:mainfrom
freddyaboulton:main

Conversation

@freddyaboulton
Copy link
Copy Markdown

Overview

This PR adds an interactive demo that enables natural, continuous conversations with Qwen2-Audio. Users can engage in fluid dialogue with the model through their microphone. Responses are automatically generated when they finish speaking. This enhancement makes the model more accessible and natural to interact with.

Key Features

  • Real-time audio streaming using WebRTC
  • Automatic speech detection and processing
  • Support for both local and cloud deployment

Dependencies

Added requirements:

  • gradio-webrtc (gradio custom component that enables real time audio/video streaming). Disclaimer - I am the author of this extension.
  • twilio (optional, for cloud deployment)

Demo

qwen2-audio.mp4

There is some delay in processing the response due to acquiring the shared GPU on HuggingFace spaces. On dedicated hardware it should be much faster but I don't have the GPUs to verify myself.

@robinnarsinghranabhat
Copy link
Copy Markdown

Hi. Sorry to ask here ..

I am trying to run Qwen in my Apple M3 Pro (18 gigs combined memory). The basic inference snippet in hugging face examples takes too long (5 mins). I thought mps device would be fast.

Any suggestions what could be done ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants