Spaces:

hudaakram
/

Voice_Agent

Sleeping

App Files Files Community

hudaakram commited on Sep 12, 2025

Commit

4f7d1b7

verified ·

1 Parent(s): 1770994

Create README.md

Browse files

Files changed (1) hide show

README.md +67 -8

README.md CHANGED Viewed

@@ -1,14 +1,73 @@
 ---
-title: Voice Agent
-emoji: 🌖
-colorFrom: blue
-colorTo: red
 sdk: gradio
-sdk_version: 5.45.0
 app_file: app.py
 pinned: false
-license: mit
-short_description: simple tool
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Voice Agent – Speech → Intent → Tools
+emoji: 🎤
+colorFrom: purple
+colorTo: indigo
 sdk: gradio
 app_file: app.py
 pinned: false
+license: apache-2.0
+tags:
+  - speech-recognition
+  - whisper
+  - intent-detection
+  - ai-agent
+  - gradio
 ---
+# 🎤 Voice Agent
+**Speak or upload audio → transcript via Whisper → zero-shot intent → tool execution.**
+Live demo: **[Open the app ↗](https://huggingface.co/spaces/hudaakram/Voice_Agent)**
+![UI](assets/ui.png)
+---
+## 🔍 Abstract
+Voice Agent turns short speech snippets into **actions**. It:
+1) transcribes audio with Whisper,
+2) infers the **intent** from the text (zero-shot),
+3) optionally **executes a tool** (e.g., “turn_on_lights”, “set_timer”).
+This showcases an **AI agent** loop: *Perceive → Understand → Act*.
+---
+## 🧱 Pipeline
+![diagram](assets/diagram.png)
+1. **Audio capture** (Gradio mic/upload, 16 kHz)
+2. **ASR**: `openai/whisper-small`
+3. **Intent detection**: zero-shot text classification over a user-editable list of intents
+   *(e.g., `turn_on_lights, start_music, set_timer, create_note, open_calendar`)*
+4. **Tool layer** (mock functions in this Space) → returns a JSON “execution log”.
+---
+## 🧪 Try it
+1) Click **Record** and say something like:
+   - “turn the lights on please”
+   - “open my calendar next Tuesday”
+   - “set a timer for five minutes”
+2) Or upload a short `.wav/.mp3`.
+3) See **Top-k intents**, **Chosen intent**, and **Action result**.
+---
+## 🧩 Models & Libraries
+- ASR: [`openai/whisper-small`](https://huggingface.co/openai/whisper-small)
+- Zero-shot intent: `transformers` pipeline (`facebook/bart-large-mnli` by default)
+- UI: [Gradio](https://www.gradio.app/) on Hugging Face Spaces
+---
+## ⚙️ Requirements
+This Space uses `requirements.txt`:
+```txt
+transformers>=4.41.0
+torch
+torchaudio
+gradio>=4.0.0
+librosa
+soundfile