Spaces:
Sleeping
Sleeping
Create README.md
Browse files
README.md
CHANGED
|
@@ -1,14 +1,73 @@
|
|
| 1 |
---
|
| 2 |
-
title: Voice Agent
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version: 5.45.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
license:
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Voice Agent – Speech → Intent → Tools
|
| 3 |
+
emoji: 🎤
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: indigo
|
| 6 |
sdk: gradio
|
|
|
|
| 7 |
app_file: app.py
|
| 8 |
pinned: false
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
tags:
|
| 11 |
+
- speech-recognition
|
| 12 |
+
- whisper
|
| 13 |
+
- intent-detection
|
| 14 |
+
- ai-agent
|
| 15 |
+
- gradio
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# 🎤 Voice Agent
|
| 19 |
+
**Speak or upload audio → transcript via Whisper → zero-shot intent → tool execution.**
|
| 20 |
+
Live demo: **[Open the app ↗](https://huggingface.co/spaces/hudaakram/Voice_Agent)**
|
| 21 |
+
|
| 22 |
+

|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## 🔍 Abstract
|
| 27 |
+
Voice Agent turns short speech snippets into **actions**. It:
|
| 28 |
+
1) transcribes audio with Whisper,
|
| 29 |
+
2) infers the **intent** from the text (zero-shot),
|
| 30 |
+
3) optionally **executes a tool** (e.g., “turn_on_lights”, “set_timer”).
|
| 31 |
+
|
| 32 |
+
This showcases an **AI agent** loop: *Perceive → Understand → Act*.
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## 🧱 Pipeline
|
| 37 |
+

|
| 38 |
+
|
| 39 |
+
1. **Audio capture** (Gradio mic/upload, 16 kHz)
|
| 40 |
+
2. **ASR**: `openai/whisper-small`
|
| 41 |
+
3. **Intent detection**: zero-shot text classification over a user-editable list of intents
|
| 42 |
+
*(e.g., `turn_on_lights, start_music, set_timer, create_note, open_calendar`)*
|
| 43 |
+
4. **Tool layer** (mock functions in this Space) → returns a JSON “execution log”.
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
## 🧪 Try it
|
| 48 |
+
1) Click **Record** and say something like:
|
| 49 |
+
- “turn the lights on please”
|
| 50 |
+
- “open my calendar next Tuesday”
|
| 51 |
+
- “set a timer for five minutes”
|
| 52 |
+
2) Or upload a short `.wav/.mp3`.
|
| 53 |
+
3) See **Top-k intents**, **Chosen intent**, and **Action result**.
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## 🧩 Models & Libraries
|
| 58 |
+
- ASR: [`openai/whisper-small`](https://huggingface.co/openai/whisper-small)
|
| 59 |
+
- Zero-shot intent: `transformers` pipeline (`facebook/bart-large-mnli` by default)
|
| 60 |
+
- UI: [Gradio](https://www.gradio.app/) on Hugging Face Spaces
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## ⚙️ Requirements
|
| 65 |
+
This Space uses `requirements.txt`:
|
| 66 |
+
|
| 67 |
+
```txt
|
| 68 |
+
transformers>=4.41.0
|
| 69 |
+
torch
|
| 70 |
+
torchaudio
|
| 71 |
+
gradio>=4.0.0
|
| 72 |
+
librosa
|
| 73 |
+
soundfile
|