hudaakram commited on
Commit
4f7d1b7
·
verified ·
1 Parent(s): 1770994

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -8
README.md CHANGED
@@ -1,14 +1,73 @@
1
  ---
2
- title: Voice Agent
3
- emoji: 🌖
4
- colorFrom: blue
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 5.45.0
8
  app_file: app.py
9
  pinned: false
10
- license: mit
11
- short_description: simple tool
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Voice Agent – Speech → Intent → Tools
3
+ emoji: 🎤
4
+ colorFrom: purple
5
+ colorTo: indigo
6
  sdk: gradio
 
7
  app_file: app.py
8
  pinned: false
9
+ license: apache-2.0
10
+ tags:
11
+ - speech-recognition
12
+ - whisper
13
+ - intent-detection
14
+ - ai-agent
15
+ - gradio
16
  ---
17
 
18
+ # 🎤 Voice Agent
19
+ **Speak or upload audio → transcript via Whisper → zero-shot intent → tool execution.**
20
+ Live demo: **[Open the app ↗](https://huggingface.co/spaces/hudaakram/Voice_Agent)**
21
+
22
+ ![UI](assets/ui.png)
23
+
24
+ ---
25
+
26
+ ## 🔍 Abstract
27
+ Voice Agent turns short speech snippets into **actions**. It:
28
+ 1) transcribes audio with Whisper,
29
+ 2) infers the **intent** from the text (zero-shot),
30
+ 3) optionally **executes a tool** (e.g., “turn_on_lights”, “set_timer”).
31
+
32
+ This showcases an **AI agent** loop: *Perceive → Understand → Act*.
33
+
34
+ ---
35
+
36
+ ## 🧱 Pipeline
37
+ ![diagram](assets/diagram.png)
38
+
39
+ 1. **Audio capture** (Gradio mic/upload, 16 kHz)
40
+ 2. **ASR**: `openai/whisper-small`
41
+ 3. **Intent detection**: zero-shot text classification over a user-editable list of intents
42
+ *(e.g., `turn_on_lights, start_music, set_timer, create_note, open_calendar`)*
43
+ 4. **Tool layer** (mock functions in this Space) → returns a JSON “execution log”.
44
+
45
+ ---
46
+
47
+ ## 🧪 Try it
48
+ 1) Click **Record** and say something like:
49
+ - “turn the lights on please”
50
+ - “open my calendar next Tuesday”
51
+ - “set a timer for five minutes”
52
+ 2) Or upload a short `.wav/.mp3`.
53
+ 3) See **Top-k intents**, **Chosen intent**, and **Action result**.
54
+
55
+ ---
56
+
57
+ ## 🧩 Models & Libraries
58
+ - ASR: [`openai/whisper-small`](https://huggingface.co/openai/whisper-small)
59
+ - Zero-shot intent: `transformers` pipeline (`facebook/bart-large-mnli` by default)
60
+ - UI: [Gradio](https://www.gradio.app/) on Hugging Face Spaces
61
+
62
+ ---
63
+
64
+ ## ⚙️ Requirements
65
+ This Space uses `requirements.txt`:
66
+
67
+ ```txt
68
+ transformers>=4.41.0
69
+ torch
70
+ torchaudio
71
+ gradio>=4.0.0
72
+ librosa
73
+ soundfile