Spaces:
Running
Running
| title: Voice Agent – Speech → Intent → Tools | |
| emoji: 🎤 | |
| colorFrom: purple | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - speech-recognition | |
| - whisper | |
| - intent-detection | |
| - ai-agent | |
| - gradio | |
| # 🎤 Voice Agent | |
| **Speak or upload audio → transcript via Whisper → zero-shot intent → tool execution.** | |
| Live demo: **[Open the app ↗](https://huggingface.co/spaces/hudaakram/Voice_Agent)** | |
|  | |
| --- | |
| ## 🔍 Abstract | |
| Voice Agent turns short speech snippets into **actions**. It: | |
| 1) transcribes audio with Whisper, | |
| 2) infers the **intent** from the text (zero-shot), | |
| 3) optionally **executes a tool** (e.g., “turn_on_lights”, “set_timer”). | |
| This showcases an **AI agent** loop: *Perceive → Understand → Act*. | |
| --- | |
| ## 🧱 Pipeline | |
|  | |
| 1. **Audio capture** (Gradio mic/upload, 16 kHz) | |
| 2. **ASR**: `openai/whisper-small` | |
| 3. **Intent detection**: zero-shot text classification over a user-editable list of intents | |
| *(e.g., `turn_on_lights, start_music, set_timer, create_note, open_calendar`)* | |
| 4. **Tool layer** (mock functions in this Space) → returns a JSON “execution log”. | |
| --- | |
| ## 🧪 Try it | |
| 1) Click **Record** and say something like: | |
| - “turn the lights on please” | |
| - “open my calendar next Tuesday” | |
| - “set a timer for five minutes” | |
| 2) Or upload a short `.wav/.mp3`. | |
| 3) See **Top-k intents**, **Chosen intent**, and **Action result**. | |
| --- | |
| ## 🧩 Models & Libraries | |
| - ASR: [`openai/whisper-small`](https://huggingface.co/openai/whisper-small) | |
| - Zero-shot intent: `transformers` pipeline (`facebook/bart-large-mnli` by default) | |
| - UI: [Gradio](https://www.gradio.app/) on Hugging Face Spaces | |
| --- | |
| ## ⚙️ Requirements | |
| This Space uses `requirements.txt`: | |
| ```txt | |
| transformers>=4.41.0 | |
| torch | |
| torchaudio | |
| gradio>=4.0.0 | |
| librosa | |
| soundfile | |