Spaces:

hudaakram
/

Voice_Agent

Running

App Files Files Community

Voice_Agent / README.md

hudaakram

Create README.md

4f7d1b7 verified 4 months ago

preview code

raw

history blame contribute delete

1.91 kB

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

metadata

title: Voice Agent – Speech → Intent → Tools
emoji: 🎤
colorFrom: purple
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - speech-recognition
  - whisper
  - intent-detection
  - ai-agent
  - gradio

🎤 Voice Agent

Speak or upload audio → transcript via Whisper → zero-shot intent → tool execution.
Live demo: Open the app ↗

🔍 Abstract

Voice Agent turns short speech snippets into actions. It:

transcribes audio with Whisper,
infers the intent from the text (zero-shot),
optionally executes a tool (e.g., “turn_on_lights”, “set_timer”).

This showcases an AI agent loop: Perceive → Understand → Act.

🧱 Pipeline

Audio capture (Gradio mic/upload, 16 kHz)
ASR: openai/whisper-small
Intent detection: zero-shot text classification over a user-editable list of intents
(e.g., turn_on_lights, start_music, set_timer, create_note, open_calendar)
Tool layer (mock functions in this Space) → returns a JSON “execution log”.

🧪 Try it

Click Record and say something like:
- “turn the lights on please”
- “open my calendar next Tuesday”
- “set a timer for five minutes”
Or upload a short .wav/.mp3.
See Top-k intents, Chosen intent, and Action result.

🧩 Models & Libraries

ASR: openai/whisper-small
Zero-shot intent: transformers pipeline (facebook/bart-large-mnli by default)
UI: Gradio on Hugging Face Spaces

⚙️ Requirements

This Space uses requirements.txt:

transformers>=4.41.0
torch
torchaudio
gradio>=4.0.0
librosa
soundfile