File size: 2,630 Bytes
c447b1f
c3397c4
 
 
 
c447b1f
c3397c4
c447b1f
c3397c4
 
 
c447b1f
 
c3397c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: PlotWeaver Voice Agent
emoji: πŸ—£οΈ
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: true
short_description: Hausa voice AI for African banks, telecoms, and delivery
license: apache-2.0
---

# PlotWeaver Voice Agent

Hausa-first conversational AI demo. Product 7 of the PlotWeaver suite: voice bots for WhatsApp, phone, and customer support across African banks, telecoms, and delivery services.

## What it does

- **ASR**: Whisper-small transcribes your Hausa audio
- **NLU**: Hybrid three-tier system β€” rule-based keyword fast path β†’ Qwen2.5-1.5B-Instruct zero-shot classifier for paraphrases β†’ rule-based safety fallback. The pipeline trace shows which tier answered each turn.
- **Dialogue manager**: deterministic FSM across 3 verticals (Bank, Telecom, Delivery)
- **TTS**: `facebook/mms-tts-hau` synthesizes the bot's Hausa response

## How to use

1. Pick a vertical (Bank / Telecom / Delivery)
2. Three ways to talk to the agent:
   - **Type** a Hausa phrase in the text box
   - **Record** via browser microphone
   - **Upload** a pre-recorded Hausa audio file (.wav, .mp3, .ogg β€” up to 30s)
3. For audio, click "Transcribe & send" after recording/uploading
4. Watch the pipeline trace on the left β€” session load, ASR, NLU, dialogue manager, TTS
5. The bot's audio response autoplays; full multi-turn flows work (balance check, transfers, complaints, rescheduling, etc.)

## Demo flows

**Bank**: "duba ma'auni" β†’ "1234" β†’ bot returns your balance.

**Telecom**: "saya airtime" β†’ "1000" β†’ airtime loaded.

**Delivery**: "bincika oda" β†’ "10234" β†’ order status.

**Escalation**: say "mutum" or "wakili" at any time to flag a human handoff.

## Architecture

```
User (WhatsApp/Phone/Web)
    ↓
ASR (Whisper) β†’ NLU (XLM-R) β†’ Dialogue FSM β†’ Response Gen β†’ TTS (MMS)
    ↓                                                          ↓
Session state (Redis, 10min TTL)                          Bot audio
```

## Notes

First turn takes ~30-60s to cold-start ASR + TTS models (~640MB total). The Qwen2.5-1.5B NLU model (~3GB) only loads when a user utterance doesn't match the rule-based keyword set β€” so common phrases stay fast, and novel phrasings trigger a 30-40s one-time LLM load (then ~5-8s per subsequent LLM call on CPU).

For production a GPU Space or dedicated endpoint brings full turn latency under 1s.

This is a POC demo. Production plan covers fine-tuned Hausa Whisper, fine-tuned XLM-R or AfroXLMR NLU classifier (replacing the LLM for consistent sub-100ms NLU), live WhatsApp Business Cloud integration, and Twilio Voice.