Spaces:

PlotweaverAI
/

Voice-AI-Agent

Sleeping

App Files Files Community

Voice-AI-Agent / README.md

Toadoum

Upload 4 files

c3397c4 verified 23 days ago

preview code

raw

history blame contribute delete

2.63 kB

	---
	title: PlotWeaver Voice Agent
	emoji: 🗣️
	colorFrom: green
	colorTo: blue
	sdk: gradio
	sdk_version: 4.44.1
	app_file: app.py
	pinned: true
	short_description: Hausa voice AI for African banks, telecoms, and delivery
	license: apache-2.0
	---

	# PlotWeaver Voice Agent

	Hausa-first conversational AI demo. Product 7 of the PlotWeaver suite: voice bots for WhatsApp, phone, and customer support across African banks, telecoms, and delivery services.

	## What it does

	- ASR: Whisper-small transcribes your Hausa audio
	- NLU: Hybrid three-tier system — rule-based keyword fast path → Qwen2.5-1.5B-Instruct zero-shot classifier for paraphrases → rule-based safety fallback. The pipeline trace shows which tier answered each turn.
	- Dialogue manager: deterministic FSM across 3 verticals (Bank, Telecom, Delivery)
	- TTS: `facebook/mms-tts-hau` synthesizes the bot's Hausa response

	## How to use

	1. Pick a vertical (Bank / Telecom / Delivery)
	2. Three ways to talk to the agent:
	- Type a Hausa phrase in the text box
	- Record via browser microphone
	- Upload a pre-recorded Hausa audio file (.wav, .mp3, .ogg — up to 30s)
	3. For audio, click "Transcribe & send" after recording/uploading
	4. Watch the pipeline trace on the left — session load, ASR, NLU, dialogue manager, TTS
	5. The bot's audio response autoplays; full multi-turn flows work (balance check, transfers, complaints, rescheduling, etc.)

	## Demo flows

	Bank: "duba ma'auni" → "1234" → bot returns your balance.

	Telecom: "saya airtime" → "1000" → airtime loaded.

	Delivery: "bincika oda" → "10234" → order status.

	Escalation: say "mutum" or "wakili" at any time to flag a human handoff.

	## Architecture

	```
	User (WhatsApp/Phone/Web)
	↓
	ASR (Whisper) → NLU (XLM-R) → Dialogue FSM → Response Gen → TTS (MMS)
	↓ ↓
	Session state (Redis, 10min TTL) Bot audio
	```

	## Notes

	First turn takes ~30-60s to cold-start ASR + TTS models (~640MB total). The Qwen2.5-1.5B NLU model (~3GB) only loads when a user utterance doesn't match the rule-based keyword set — so common phrases stay fast, and novel phrasings trigger a 30-40s one-time LLM load (then ~5-8s per subsequent LLM call on CPU).

	For production a GPU Space or dedicated endpoint brings full turn latency under 1s.

	This is a POC demo. Production plan covers fine-tuned Hausa Whisper, fine-tuned XLM-R or AfroXLMR NLU classifier (replacing the LLM for consistent sub-100ms NLU), live WhatsApp Business Cloud integration, and Twilio Voice.