ArmDeveloperEcosystem · arm-bhanuarya · Apr 13, 2026 · Apr 14, 2026 · Apr 14, 2026
diff --git a/...aths/mobile-graphics-and-gaming/voice-sentiment-analysis-with-llm/1_overview.md b/...aths/mobile-graphics-and-gaming/voice-sentiment-analysis-with-llm/1_overview.md
@@ -0,0 +1,12 @@
+---
+title: Overview
+weight: 2
+layout: learningpathall
+---
+Voice-based LLM applications often rely primarily on transcribed text from speech input, such as in interactions with non-player characters in games or voice assistants. This approach can overlook vocal cues—like tone, pitch, and emotion—present in a speaker’s voice. As a result, responses may feel less natural and may not fully capture the user’s underlying intent.
+
+To address this, voice-based sentiment classification analyzes audio input to determine the user’s emotional state, which is then incorporated into the LLM prompt to enable more context-aware responses. In this Learning Path, we will build a sentiment-aware voice assistant that runs entirely on-device. The application records audio, performs transcription—converting speech into written text—using Whisper, classifies sentiment directly from the voice signal, and combines the transcript and voice-based sentiment to guide responses from a local LLM running with llama.cpp.
+
+![Voice sentiment classification pipeline#center](1_vsapipeline2.png "Voice sentiment classification pipeline")
+
+You will start by building a baseline voice-to-LLM pipeline—capturing audio, transcribing it into text, and using it to generate responses with an LLM. You will then extend this pipeline with a voice-based sentiment classification model. This involves training the model, optimizing it for efficient on-device inference, and integrating it into a unified application.
diff --git a/...mobile-graphics-and-gaming/voice-sentiment-analysis-with-llm/1_vsapipeline2.png b/...mobile-graphics-and-gaming/voice-sentiment-analysis-with-llm/1_vsapipeline2.png
diff --git a/...phics-and-gaming/voice-sentiment-analysis-with-llm/2_set-up-your-environment.md b/...phics-and-gaming/voice-sentiment-analysis-with-llm/2_set-up-your-environment.md
@@ -0,0 +1,119 @@
+---
+title: Set up your environment
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+Before building the voice assistant, create a project workspace and set up an isolated `UV` environment. This keeps project dependencies separate from your system installation and makes it easier to reproduce the steps in the rest of the Learning Path.
+
+These instructions support Ubuntu, macOS, and Windows, with Python 3.9 or later and a working microphone.
+
+Check your Python version before continuing:
+
+**Ubuntu or macOS**
+
+```bash
+python3 --version
+```
+
+**Windows PowerShell**
+
+```powershell
+py -3 --version
+```
+
+## Set up the Python environment with UV
+
+Install `UV` first from PyPI using `pip`. `UV` is a fast Python package and environment manager that we will use throughout this Learning Path to create the project environment and install dependencies.
+
+**Ubuntu or macOS**
+
+```bash
+mkdir -p ~/voice-sentiment-assistant
+cd ~/voice-sentiment-assistant
+python3 -m pip install uv
+uv venv .venv
+source .venv/bin/activate
+```
+
+**Windows PowerShell**
+
+```powershell
+mkdir $HOME\voice-sentiment-assistant -Force
+cd $HOME\voice-sentiment-assistant
+py -3 -m pip install uv
+uv venv .venv
+.\.venv\Scripts\Activate.ps1
+```
+
+Keep this virtual environment activated while you complete the rest of the Learning Path.
+
+Create a `requirements.txt` file for the packages used across the rest of the Learning Path:
+
+```txt
+gradio
+openai-whisper
+requests
+torch
+transformers
+pandas
+numpy
+librosa
+scikit-learn
+onnx
+onnxruntime
+```
+
+Install the dependencies into your active `UV` virtual environment:
+
+```console
+uv pip install -r requirements.txt
+```
+
+This installs the libraries needed for the Gradio interface, Whisper transcription, model training, and ONNX Runtime inference. Some packages in this list are used later in the Learning Path when you optimize and export the sentiment model.
+
+## Download, build, and run llama.cpp
+
+Next, clone the [llama.cpp GitHub repository](https://github.com/ggml-org/llama.cpp), build the local inference server, and start it. This server exposes an OpenAI-compatible API that the Python application will call later in the Learning Path.
+
+**Ubuntu or macOS**
+
+```bash
+git clone https://github.com/ggml-org/llama.cpp
+cd llama.cpp
+cmake -B build
+cmake --build build --config Release
+```
+
+**Windows PowerShell**
+
+```powershell
+git clone https://github.com/ggml-org/llama.cpp
+cd llama.cpp
+cmake -B build
+cmake --build build --config Release
+```
+
+When the build completes, the `llama-server` executable should be available in the build output directory. This Learning Path uses a quantized [Gemma 3 1B instruction-tuned model](https://huggingface.co/google/gemma-3-1b-it) served locally through `llama.cpp`.
+
+The first time you run this command, `llama.cpp` will download the model from Hugging Face. This can take several minutes depending on your network connection.
+
+**Ubuntu or macOS**
+
+Run the following command from the `llama.cpp` directory:
+
+```bash
+./build/bin/llama-server -hf ggml-org/gemma-3-1b-it-GGUF
+```
+
+**Windows PowerShell**
+
+```powershell
+.\build\bin\Release\llama-server.exe -hf ggml-org/gemma-3-1b-it-GGUF
+```
+
+Leave this terminal running while you test the application in later steps. The server listens on a local OpenAI-compatible endpoint that your app will call to generate responses.
+
+At this point, your development environment is ready. You have installed the required audio and build tools, created a `UV` environment, installed the Python dependencies, and started a local `llama.cpp` server. In the next section, you will use this setup to build the baseline voice-to-LLM pipeline by creating a simple Gradio interface, transcribing microphone input with Whisper, and sending the transcript to the local LLM.
diff --git a/...ing/voice-sentiment-analysis-with-llm/3_build-baseline-voice-to-llm-pipeline.md b/...ing/voice-sentiment-analysis-with-llm/3_build-baseline-voice-to-llm-pipeline.md
@@ -0,0 +1,187 @@
+---
+title: Build the voice-to-LLM pipeline
+weight: 4
+layout: learningpathall
+---
+
+In this section, you will build an end-to-end pipeline that:
+
+1. Records audio from your microphone
+2. Transcribes it to text using Whisper
+3. Sends the text to a locally hosted LLM
+4. Displays the model's response
+
+This forms the foundation of your voice assistant.
+
+![Baseline voice-to-LLM pipeline#center](3_vsapipeline1.png "Baseline voice-to-LLM pipeline")
+
+Before you begin, make sure you have completed the environment setup in the previous section and that your `llama-server` is still running.
+
+### Step 1.1 - Create a basic Gradio UI
+
+Start by creating a simple web interface that captures microphone input.
+Gradio is a Python library for building simple browser-based interfaces. Here, you use it to create a small front end that records audio from your microphone.
+
+This is a good first step because it lets you confirm that microphone capture works before you add transcription and model inference.
+
+Create a file called `app.py`:
+
+```python
+import gradio as gr
+
+with gr.Blocks() as demo:
+    mic = gr.Audio(sources="microphone", type="filepath")
+
+demo.launch()
+```
+
+Run the app:
+
+```bash
+python app.py
+```
+
+Open your browser at:
+
+`http://127.0.0.1:7860`
+
+You should now see a simple interface that allows you to record audio.
+At this stage, the app only captures audio. It does not yet transcribe speech or send anything to the LLM.
+
+### Step 1.2 - Add speech-to-text with Whisper
+
+Next, add transcription using the Whisper model.
+Whisper is a speech-to-text model. It takes audio as input and returns a text transcript. In this pipeline, it converts spoken input into text before anything is sent to the LLM.
+
+Update `app.py` with the following code:
+
+```python
+import whisper
+
+# Load a small Whisper model for local transcription
+model = whisper.load_model("base")
+
+def transcribe(audio):
+    return model.transcribe(audio)["text"]
+```
+
+The first time you run this, Whisper will download the model, which may take a few minutes.
+
+At this stage, your app can convert recorded audio into text.
+The output of this step is a text transcript that represents what the user said.
+
+### Step 1.3 - Connect to the local LLM
+
+Define the OpenAI-compatible endpoint exposed by `llama-server`.
+An endpoint is the URL your program uses to talk to another service. In this case, `llama-server` exposes a local API on your machine, and your app sends the transcript there to get a response.
+
+Because the server is OpenAI-compatible, the request format looks like a standard chat completions API.
+
+Update `app.py` with the following import and endpoint definition:
+
+```python
+import requests
+
+LOCAL_LLM_URL = "http://127.0.0.1:8080/v1/chat/completions"
+```
+
+Make sure your `llama-server` from the previous section is running before continuing.
+Without the local server running, the next step will not be able to generate an answer.
+
+### Step 1.4 - Build the full pipeline
+
+Now combine transcription and LLM interaction into a single function.
+This function becomes the core of the application. Audio goes in, text is extracted, that text is sent to the model, and the response comes back out.
+
+Keeping the logic in one function makes it easier to connect the pipeline to the user interface in the next step.
+
+Update `app.py` by adding the following function:
+
+```python
+def handle_audio(audio):
+    # Step 1: Transcribe audio
+    text = transcribe(audio)
+
+    # Step 2: Send transcript to local LLM
+    response = requests.post(
+        LOCAL_LLM_URL,
+        json={
+            "model": "local-model",
+            "messages": [{"role": "user", "content": text}],
+        },
+    )
+
+    if response.status_code != 200:
+        return text, "Error: LLM request failed"
+
+    data = response.json()
+    answer = data["choices"][0]["message"]["content"]
+
+    return text, answer
+```
+
+### Step 1.5 - Connect the UI to the pipeline
+
+Update your UI so that recorded audio triggers the full pipeline and displays results.
+This is the final integration step. You now connect the interface, transcription, and model request so the app behaves like a real voice assistant.
+
+When the user records audio, Gradio calls your pipeline function. The app then shows both the transcript and the assistant response in the browser.
+
+Update `app.py` so it contains the following complete version:
+
+```python
+import gradio as gr
+import whisper
+import requests
+
+model = whisper.load_model("base")
+
+LOCAL_LLM_URL = "http://127.0.0.1:8080/v1/chat/completions"
+
+def transcribe(audio):
+    return model.transcribe(audio)["text"]
+
+def handle_audio(audio):
+    text = transcribe(audio)
+
+    response = requests.post(
+        LOCAL_LLM_URL,
+        json={
+            "model": "local-model",
+            "messages": [{"role": "user", "content": text}],
+        },
+    )
+
+    if response.status_code != 200:
+        return text, "Error: LLM request failed"
+
+    data = response.json()
+    answer = data["choices"][0]["message"]["content"]
+
+    return text, answer
+
+with gr.Blocks() as demo:
+    mic = gr.Audio(sources="microphone", type="filepath")
+    transcript = gr.Textbox(label="Transcript")
+    response = gr.Textbox(label="LLM Response")
+
+    mic.change(fn=handle_audio, inputs=mic, outputs=[transcript, response])
+
+demo.launch()
+```
+
+## What you should see
+
+After recording audio in the browser:
+
+- Your speech is transcribed into text
+- The transcript is sent to the local LLM
+- The LLM response is displayed in the interface
+
+## Troubleshooting
+
+- No response from LLM: ensure `llama-server` is still running.
+- Whisper is slow on first run: this is expected due to model download and initialization.
+- Microphone not working: check browser permissions for microphone access.
+
+At this point, you have a working voice-to-LLM pipeline. In the next section, you will extend this pipeline by adding a voice sentiment classification model.
diff --git a/...mobile-graphics-and-gaming/voice-sentiment-analysis-with-llm/3_vsapipeline1.png b/...mobile-graphics-and-gaming/voice-sentiment-analysis-with-llm/3_vsapipeline1.png
diff --git a/...nd-gaming/voice-sentiment-analysis-with-llm/4_modelconversionandcompression.png b/...nd-gaming/voice-sentiment-analysis-with-llm/4_modelconversionandcompression.png