File size: 14,909 Bytes
76db545
26659d8
096b19d
 
 
76db545
ca322f7
26659d8
76db545
 
 
 
 
 
 
da3a060
 
 
096b19d
76db545
 
096b19d
76db545
 
da3a060
76db545
da3a060
76db545
da3a060
76db545
da3a060
 
096b19d
d0e28fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e99c2c
 
 
 
 
 
 
 
 
 
 
 
 
 
d0e28fa
da3a060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0e28fa
 
 
da3a060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e99c2c
d0e28fa
da3a060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0e28fa
da3a060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0e28fa
da3a060
d0e28fa
 
 
da3a060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
096b19d
 
 
da3a060
096b19d
d0e28fa
da3a060
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
---
title: Sahel-Voice-Lab  Minimal
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "5.25.0"
app_file: app_minimal.py
hardware: cpu-basic
pinned: false
license: mit
tags:
  - bambara
  - fula
  - speech-recognition
  - text-to-speech
  - agriculture
  - iot
  - language-learning
  - west-africa
  - low-resource-nlp
  - memory
---

# 🌍 Sahel-Voice-Lab

**A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).**

Two intertwined jobs:

1. **Memory loop** — users *teach* the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers.
2. **Agricultural IoT voice interface** — Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≤ 6 words per sentence for clean TTS.

The core stack is explicitly **100% non-Meta** (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback.

---

## What this Space currently runs — the `ground-zero` minimal baseline

The deployed Space (`app_file: app_minimal.py`) is the **Month 1–3 rebuild**
baseline — a stripped-down Whisper → LLM → MMS-TTS pipeline used for field
testing and to build a real-user eval set. No LoRA adapters, no memory loop,
no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in
`app.py` still exists for the full production stack; it is just not what the
Space serves today.

Three stacked changes land dialect fidelity without any training:

1. **Stage 1 — dialect-pinned system prompt** (`src/llm/minimal_client.py`).
   Replaces the `GemmaClient` JSON/teacher flow with a plain-text client whose
   system prompt pins the target dialect explicitly — *Bambara as spoken in
   Bamako, Mali* and *Pular of Fuuta Jallon, as spoken in Guinea* — names the
   languages the model must **not** drift into (Wolof, Hausa, Pulaar of
   Senegal, Fulfulde of Nigeria, Jula of Côte d'Ivoire), and injects a 30-pair
   bilingual gold list as few-shot anchoring
   (`configs/dialect_anchors/{bambara_mali,pular_guinea}.json`).

2. **Stage 2 — curated phrasebook short-circuit** (`src/llm/phrasebook.py`).
   Before calling the LLM, the user's input is normalised and fuzzy-matched
   (threshold 0.88) against a curated English-keyed phrasebook
   (`configs/dialect_anchors/{bambara,pular}_phrasebook.json` — 100 Bambara /
   110 Pular entries across greetings, family, food, farming, health,
   shopping, travel, clarity, time, parting). A hit returns the gold
   translation directly — zero LLM risk, zero latency.

3. **Stage 3 — better multilingual base LLM.**
   Default `LLM_MODEL_ID` is now **`CohereLabs/aya-expanse-32b`**, a 23-language
   multilingual model with much stronger West African coverage than Qwen
   2.5-7B. Can be overridden via the `LLM_MODEL_ID` env var (e.g. to
   `Qwen/Qwen2.5-72B-Instruct`) if Cohere's inference provider is not
   available on your HF account.

4. **Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot.**
   Both Voice and Text tabs use a 4-box layout: phrasebook translation (text
   + audio) is automatic on submit (no LLM), and a separate **Generate reply**
   button calls the dialect-anchored LLM for a conversational response. On a
   phrasebook miss the LLM is RAG-injected with the top-3 nearest curated
   pairs as additional style anchoring. Every turn is appended to
   `data/field_turns.jsonl` (`src/engine/turn_logger.py`) with phase, latency
   breakdown, phrasebook hit, and reply — the substrate for hit-rate
   measurement, A/B comparisons, and eventual Stage-5 LoRA training-data
   curation. The system prompt now also explicitly tells the LLM to **reply,
   not translate** — the few-shot pairs are framed as style/orthography
   references only, fixing the "the LLM just echoes the phrasebook target"
   regression.

See `docs/baseline_rebuild.md` for the broader minimal-track plan.

---

## Status

| Phase | Feature | State |
|------:|---------|-------|
| 1 | Memory loop (JSONL + HF Hub) | ✅ shipped |
| 2 | Waxal VITS TTS — Bambara | ✅ shipped |
| 2 | Waxal VITS TTS — Fula | ⏳ placeholder until `ous-sow/fula-tts` is trained |
| 3 | Voice-to-voice S2S (F5-TTS + CER) | 🚧 merged, stabilizing |
| — | Adlam ↔ Latin round-trip, per-language prompts | ✅ landed |

See `docs/roadmap_2026-04.md` for the full plan and `docs/baseline_rebuild.md` for the parallel minimal-track strategy.

---

## Stack

| Layer | Tool |
|-------|------|
| STT | `openai/whisper-large-v3-turbo` + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch) |
| LLM | `CohereLabs/aya-expanse-32b` (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient — overridable to `Qwen/Qwen2.5-72B-Instruct`, `Qwen2.5-7B-Instruct`, Mistral, Zephyr |
| Dialect anchoring (minimal) | `src/llm/minimal_client.py` — pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails |
| Phrasebook short-circuit (minimal) | `src/llm/phrasebook.py` — 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call |
| TTS (baseline) | `facebook/mms-tts-bam`, `facebook/mms-tts-ful` |
| TTS (Bambara) | `ynnov/ekodi-bambara-tts-female` (Waxal VITS) |
| TTS (Fula) | placeholder → `ous-sow/fula-tts` when published |
| Voice cloning | F5-TTS + OpenVoice V2 (Phase 3, GPU-only) |
| Speaker ID | SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine ≥ 0.75 |
| Fast path | RapidFuzz over `data/phrases/{lang}.json` for greetings / thanks / farewells |
| Persistence | JSONL on disk + HF Hub datasets (no ORM) |
| Training | PEFT LoRA + `Seq2SeqTrainer` on FLEURS, Jeli-ASR, SLR 105/106 |

---

## Three entry points (do not conflate)

| File | Purpose | Lifecycle |
|------|---------|-----------|
| `app_minimal.py` | **Minimal baseline Gradio UI** — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). | `python app_minimal.py` |
| `app.py` | **Full production Gradio UI** (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. | `python app.py` |
| `app_lab.py` | **Experimental Gradio UI** for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`. | `python app_lab.py` |
| `src/api/app.py` | **FastAPI service** — loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`. | `python scripts/run_server.py` |

---

## Repository layout

```
app.py                         # Gradio (production, HF Spaces)
app_lab.py                     # Gradio (experimental)
requirements.txt               # Spaces runtime — do NOT pin torch/torchaudio
packages.txt                   # apt deps (ffmpeg)
configs/
  base_config.yaml             # shared settings
  api_config.yaml              # FastAPI-specific
  lora_bambara.yaml            # Bambara LoRA hyperparams
  lora_fula.yaml               # Fula LoRA hyperparams
data/
  phrases/                     # RapidFuzz shortcut phrase JSONs per language
  vocabulary.jsonl             # local mirror of the HF Hub memory dataset
docs/
  roadmap_2026-04.md           # full architectural walkthrough + action plan
  baseline_rebuild.md          # parallel minimal-track plan (non-destructive)
  notebook_collaboration.md    # Kaggle push/pull workflow for contributors
  kaggle_mcp_setup.md          # optional Kaggle MCP for Claude Desktop
notebooks/
  kaggle_master_trainer/       # -> oussow/kaggle-master-trainer (LoRA fine-tune)
  train_fula_tts/              # -> oussow/sahel-voice-fula-tts-trainer (TBD)
  bootstrap_repos.ipynb
  train_colab.ipynb            # legacy Colab trainer
scripts/
  train_bambara.py             # LoRA fine-tune entrypoint (Kaggle/RunPod)
  train_fula.py                # LoRA fine-tune entrypoint (Kaggle/RunPod)
  export_onnx.py               # merge LoRA -> ONNX -> TFLite
  verify_baseline.py           # eval harness
  run_server.py                # FastAPI launcher
  run_data_pipeline.py         # dataset prep
  push_to_hf.sh                # deploy helpers
  push_to_kaggle.sh            # deploy helpers
  runpod_setup.sh
src/
  api/                         # FastAPI app, schemas, routes, middleware
  conversation/                # memory_manager, gemma_client, phrase_matcher, intent_parser
  data/                        # dataset loading + normalization (Adlam, Bambara)
  engine/                      # adapter_manager, transcriber, stt_processor, curiosity
  iot/                         # intent_parser, voice_responder, sensor_bridge
  llm/                         # LLM client wrappers
  memory/                      # vocabulary persistence
  optimization/                # ONNX / quantization helpers
  training/                    # trainer, callbacks, augmenters
  tts/                         # mms_tts, waxal_tts, f5_tts, voice_cloner
  voice/                       # speaker_profiles (ECAPA-TDNN + OpenVoice SE)
tests/                         # pytest — api, data pipeline, engine, iot
```

---

## How the memory loop works

1. Press **Push-to-Talk** → speak in Bambara, Fula, French, or English.
2. **Whisper** transcribes. If the language has a LoRA adapter loaded, `AdapterManager` hot-swaps to it (~50 ms).
3. **Qwen** reads the vocabulary it has learned so far (`MemoryManager.get_vocabulary_context()`), then returns a structured JSON reply with `intent ∈ {teaching, question, conversation, error}`.
4. If `teaching`: the word pair is appended to `data/vocabulary.jsonl` and async-pushed to `ous-sow/sahel-agri-feedback → vocabulary.jsonl`.
5. If `question`: Qwen answers using the remembered vocabulary as source of truth.
6. If `conversation`: Qwen replies naturally.
7. TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere).

The last 5 learned words are always visible in the UI.

---

## How the agricultural voice interface works

1. User asks, e.g., *"A bɛ di wa?"* ("Is it OK?") referring to their field.
2. `intent_parser.py` (keyword-based) classifies the request: `check_soil` / `check_weather` / `irrigation_status` / `pest_alert` / etc.
3. `SensorBridge` calls the configured `SENSOR_API_URL` and returns a typed `SensorData`.
4. `voice_responder.py` maps `(Intent, SensorData)` → a short (≤ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (`SOIL_MOISTURE_LOW=30`, `TEMP_HIGH=38`, pH bounds).
5. TTS speaks the reply.

---

## Environment variables

All variables have sensible defaults, so you can boot the Space without any of them — but without `HF_TOKEN` the memory loop cannot push.

### Core
| Key | Default | Purpose |
|-----|---------|---------|
| `HF_TOKEN` | — | HF write token. Required for Hub push and gated models. |
| `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` | Memory-loop target dataset. |
| `ADAPTER_REPO_ID` | `ous-sow/sahel-agri-adapters` | Published LoRA adapters. |
| `WHISPER_MODEL_ID` | `openai/whisper-large-v3-turbo` | STT base model. |
| `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` | LLM via HF Serverless. Override to any HF Serverless-supported model. |
| `LOG_LEVEL` | `INFO` | Standard Python logging level. |
| `DEVICE` | `cuda` (FastAPI) | Torch device for inference. |

### Adapters & TTS
| Key | Default |
|-----|---------|
| `BAMBARA_ADAPTER_PATH` | `./adapters/bambara` |
| `FULA_ADAPTER_PATH` | `./adapters/fula` |
| `BAMBARA_TTS_REPO` | `ynnov/ekodi-bambara-tts-female` |
| `FULA_TTS_REPO` | `ous-sow/fula-tts` |

### IoT
| Key | Default |
|-----|---------|
| `SENSOR_API_URL` | *(unset → mock sensor)* |

### Self-Teaching tab (triggers Kaggle training runs)
| Key | Default |
|-----|---------|
| `KAGGLE_USERNAME` | — |
| `KAGGLE_KEY` | — |
| `KAGGLE_KERNEL_SLUG` | `ous-sow/sahel-voice-master-trainer` *(override in prod to `oussow/kaggle-master-trainer` — the actual Kaggle owner slug)* |
| `AUTO_TRAIN_THRESHOLD` | `50` |

---

## Run locally

```bash
# Minimal baseline (what the Space runs)
pip install -r requirements.txt
python app_minimal.py

# Full production UI (not currently on the Space)
python app.py

# FastAPI service
python scripts/run_server.py

# Experimental lab UI
python app_lab.py
```

System-level dependency: **ffmpeg** (see `packages.txt`).

---

## Training

LoRA fine-tuning runs on **Kaggle T4** or **RunPod** — not locally. Pick one entrypoint:

| Target | Script | Notebook |
|--------|--------|----------|
| Bambara LoRA | `scripts/train_bambara.py` | `notebooks/kaggle_master_trainer/` |
| Fula LoRA | `scripts/train_fula.py` | `notebooks/kaggle_master_trainer/` |
| Fula TTS | — | `notebooks/train_fula_tts/` *(planned)* |

**Contributor workflow:** edit notebooks locally in `notebooks/<slug>/`, commit with `nbstripout` keeping diffs clean, then `cd notebooks/<slug> && kaggle kernels push` to run on Kaggle GPU. Full walkthrough in `docs/notebook_collaboration.md`.

`docs/kaggle_mcp_setup.md` documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM.

---

## Export for edge

```bash
python scripts/export_onnx.py   # merges LoRA into the backbone, exports ONNX
# then onnx-tf → TFLite for Android
```

ONNX does not support LoRA hot-swap, so export one file per language. `bitsandbytes` NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in `requirements.txt`).

---

## Tests

```bash
pytest tests/
```

Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder).

---

## Space secrets (HF UI → Settings → Secrets)

At minimum:

| Key | Value |
|-----|-------|
| `HF_TOKEN` | write-scope token |
| `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` |
| `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` (or any HF Serverless-supported model) |

---

## Design constraints (deliberate — do not change without discussion)

- **Adapter hot-swap** via PEFT's multi-adapter API — one backbone in VRAM, ~50 MB adapters per language, `set_adapter` ≈ 50 ms.
- **Qwen "adult-child" JSON contract** — structured `intent`/`reply`/`english`/`teaching_pair` output, parsed out of optional markdown fences.
- **JSONL + Hub push memory** — no ORM, thread-safe `MemoryManager`, async push so UI never blocks.
- **≤ 6 words per sentence** in `voice_responder.py` for clean MMS-TTS.
- **Adlam ↔ Latin dual-script** handling in `adlam.py` + `bam_normalize.py`.
- **Single-file `app.py`** — intentional for now; do not split without a plan.

---

## License

MIT.