Spaces:
Sleeping
Sleeping
| title: PlotWeaver Voice Agent | |
| emoji: π£οΈ | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 4.44.1 | |
| app_file: app.py | |
| pinned: true | |
| short_description: Hausa voice AI for African banks, telecoms, and delivery | |
| license: apache-2.0 | |
| # PlotWeaver Voice Agent | |
| Hausa-first conversational AI demo. Product 7 of the PlotWeaver suite: voice bots for WhatsApp, phone, and customer support across African banks, telecoms, and delivery services. | |
| ## What it does | |
| - **ASR**: Whisper-small transcribes your Hausa audio | |
| - **NLU**: Hybrid three-tier system β rule-based keyword fast path β Qwen2.5-1.5B-Instruct zero-shot classifier for paraphrases β rule-based safety fallback. The pipeline trace shows which tier answered each turn. | |
| - **Dialogue manager**: deterministic FSM across 3 verticals (Bank, Telecom, Delivery) | |
| - **TTS**: `facebook/mms-tts-hau` synthesizes the bot's Hausa response | |
| ## How to use | |
| 1. Pick a vertical (Bank / Telecom / Delivery) | |
| 2. Three ways to talk to the agent: | |
| - **Type** a Hausa phrase in the text box | |
| - **Record** via browser microphone | |
| - **Upload** a pre-recorded Hausa audio file (.wav, .mp3, .ogg β up to 30s) | |
| 3. For audio, click "Transcribe & send" after recording/uploading | |
| 4. Watch the pipeline trace on the left β session load, ASR, NLU, dialogue manager, TTS | |
| 5. The bot's audio response autoplays; full multi-turn flows work (balance check, transfers, complaints, rescheduling, etc.) | |
| ## Demo flows | |
| **Bank**: "duba ma'auni" β "1234" β bot returns your balance. | |
| **Telecom**: "saya airtime" β "1000" β airtime loaded. | |
| **Delivery**: "bincika oda" β "10234" β order status. | |
| **Escalation**: say "mutum" or "wakili" at any time to flag a human handoff. | |
| ## Architecture | |
| ``` | |
| User (WhatsApp/Phone/Web) | |
| β | |
| ASR (Whisper) β NLU (XLM-R) β Dialogue FSM β Response Gen β TTS (MMS) | |
| β β | |
| Session state (Redis, 10min TTL) Bot audio | |
| ``` | |
| ## Notes | |
| First turn takes ~30-60s to cold-start ASR + TTS models (~640MB total). The Qwen2.5-1.5B NLU model (~3GB) only loads when a user utterance doesn't match the rule-based keyword set β so common phrases stay fast, and novel phrasings trigger a 30-40s one-time LLM load (then ~5-8s per subsequent LLM call on CPU). | |
| For production a GPU Space or dedicated endpoint brings full turn latency under 1s. | |
| This is a POC demo. Production plan covers fine-tuned Hausa Whisper, fine-tuned XLM-R or AfroXLMR NLU classifier (replacing the LLM for consistent sub-100ms NLU), live WhatsApp Business Cloud integration, and Twilio Voice. | |