Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.3.0
metadata
title: Voice Agent – Speech → Intent → Tools
emoji: 🎤
colorFrom: purple
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
tags:
- speech-recognition
- whisper
- intent-detection
- ai-agent
- gradio
🎤 Voice Agent
Speak or upload audio → transcript via Whisper → zero-shot intent → tool execution.
Live demo: Open the app ↗
🔍 Abstract
Voice Agent turns short speech snippets into actions. It:
- transcribes audio with Whisper,
- infers the intent from the text (zero-shot),
- optionally executes a tool (e.g., “turn_on_lights”, “set_timer”).
This showcases an AI agent loop: Perceive → Understand → Act.
🧱 Pipeline
- Audio capture (Gradio mic/upload, 16 kHz)
- ASR:
openai/whisper-small - Intent detection: zero-shot text classification over a user-editable list of intents
(e.g.,turn_on_lights, start_music, set_timer, create_note, open_calendar) - Tool layer (mock functions in this Space) → returns a JSON “execution log”.
🧪 Try it
- Click Record and say something like:
- “turn the lights on please”
- “open my calendar next Tuesday”
- “set a timer for five minutes”
- Or upload a short
.wav/.mp3. - See Top-k intents, Chosen intent, and Action result.
🧩 Models & Libraries
- ASR:
openai/whisper-small - Zero-shot intent:
transformerspipeline (facebook/bart-large-mnliby default) - UI: Gradio on Hugging Face Spaces
⚙️ Requirements
This Space uses requirements.txt:
transformers>=4.41.0
torch
torchaudio
gradio>=4.0.0
librosa
soundfile

