Spaces:

MataStrategy
/

ground-zero

Sleeping

File size: 14,909 Bytes

---
title: Sahel-Voice-Lab — Minimal
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "5.25.0"
app_file: app_minimal.py
hardware: cpu-basic
pinned: false
license: mit
tags:
  - bambara
  - fula
  - speech-recognition
  - text-to-speech
  - agriculture
  - iot
  - language-learning
  - west-africa
  - low-resource-nlp
  - memory
---

# 🌍 Sahel-Voice-Lab

**A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).**

Two intertwined jobs:

1. **Memory loop** — users *teach* the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers.
2. **Agricultural IoT voice interface** — Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≤ 6 words per sentence for clean TTS.

The core stack is explicitly **100% non-Meta** (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback.

---

## What this Space currently runs — the `ground-zero` minimal baseline

The deployed Space (`app_file: app_minimal.py`) is the **Month 1–3 rebuild**
baseline — a stripped-down Whisper → LLM → MMS-TTS pipeline used for field
testing and to build a real-user eval set. No LoRA adapters, no memory loop,
no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in
`app.py` still exists for the full production stack; it is just not what the
Space serves today.

Three stacked changes land dialect fidelity without any training:

1. **Stage 1 — dialect-pinned system prompt** (`src/llm/minimal_client.py`).
   Replaces the `GemmaClient` JSON/teacher flow with a plain-text client whose
   system prompt pins the target dialect explicitly — *Bambara as spoken in
   Bamako, Mali* and *Pular of Fuuta Jallon, as spoken in Guinea* — names the
   languages the model must **not** drift into (Wolof, Hausa, Pulaar of
   Senegal, Fulfulde of Nigeria, Jula of Côte d'Ivoire), and injects a 30-pair
   bilingual gold list as few-shot anchoring
   (`configs/dialect_anchors/{bambara_mali,pular_guinea}.json`).

2. **Stage 2 — curated phrasebook short-circuit** (`src/llm/phrasebook.py`).
   Before calling the LLM, the user's input is normalised and fuzzy-matched
   (threshold 0.88) against a curated English-keyed phrasebook
   (`configs/dialect_anchors/{bambara,pular}_phrasebook.json` — 100 Bambara /
   110 Pular entries across greetings, family, food, farming, health,
   shopping, travel, clarity, time, parting). A hit returns the gold
   translation directly — zero LLM risk, zero latency.

3. **Stage 3 — better multilingual base LLM.**
   Default `LLM_MODEL_ID` is now **`CohereLabs/aya-expanse-32b`**, a 23-language
   multilingual model with much stronger West African coverage than Qwen
   2.5-7B. Can be overridden via the `LLM_MODEL_ID` env var (e.g. to
   `Qwen/Qwen2.5-72B-Instruct`) if Cohere's inference provider is not
   available on your HF account.

4. **Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot.**
   Both Voice and Text tabs use a 4-box layout: phrasebook translation (text
   + audio) is automatic on submit (no LLM), and a separate **Generate reply**
   button calls the dialect-anchored LLM for a conversational response. On a
   phrasebook miss the LLM is RAG-injected with the top-3 nearest curated
   pairs as additional style anchoring. Every turn is appended to
   `data/field_turns.jsonl` (`src/engine/turn_logger.py`) with phase, latency
   breakdown, phrasebook hit, and reply — the substrate for hit-rate
   measurement, A/B comparisons, and eventual Stage-5 LoRA training-data
   curation. The system prompt now also explicitly tells the LLM to **reply,
   not translate** — the few-shot pairs are framed as style/orthography
   references only, fixing the "the LLM just echoes the phrasebook target"
   regression.

See `docs/baseline_rebuild.md` for the broader minimal-track plan.

---

## Status

| Phase | Feature | State |
|------:|---------|-------|
| 1 | Memory loop (JSONL + HF Hub) | ✅ shipped |
| 2 | Waxal VITS TTS — Bambara | ✅ shipped |
| 2 | Waxal VITS TTS — Fula | ⏳ placeholder until `ous-sow/fula-tts` is trained |
| 3 | Voice-to-voice S2S (F5-TTS + CER) | 🚧 merged, stabilizing |
| — | Adlam ↔ Latin round-trip, per-language prompts | ✅ landed |

See `docs/roadmap_2026-04.md` for the full plan and `docs/baseline_rebuild.md` for the parallel minimal-track strategy.

---

## Stack

| Layer | Tool |
|-------|------|
| STT | `openai/whisper-large-v3-turbo` + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch) |
| LLM | `CohereLabs/aya-expanse-32b` (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient — overridable to `Qwen/Qwen2.5-72B-Instruct`, `Qwen2.5-7B-Instruct`, Mistral, Zephyr |
| Dialect anchoring (minimal) | `src/llm/minimal_client.py` — pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails |
| Phrasebook short-circuit (minimal) | `src/llm/phrasebook.py` — 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call |
| TTS (baseline) | `facebook/mms-tts-bam`, `facebook/mms-tts-ful` |
| TTS (Bambara) | `ynnov/ekodi-bambara-tts-female` (Waxal VITS) |
| TTS (Fula) | placeholder → `ous-sow/fula-tts` when published |
| Voice cloning | F5-TTS + OpenVoice V2 (Phase 3, GPU-only) |
| Speaker ID | SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine ≥ 0.75 |
| Fast path | RapidFuzz over `data/phrases/{lang}.json` for greetings / thanks / farewells |
| Persistence | JSONL on disk + HF Hub datasets (no ORM) |
| Training | PEFT LoRA + `Seq2SeqTrainer` on FLEURS, Jeli-ASR, SLR 105/106 |

---

## Three entry points (do not conflate)

| File | Purpose | Lifecycle |
|------|---------|-----------|
| `app_minimal.py` | **Minimal baseline Gradio UI** — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). | `python app_minimal.py` |
| `app.py` | **Full production Gradio UI** (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. | `python app.py` |
| `app_lab.py` | **Experimental Gradio UI** for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`. | `python app_lab.py` |
| `src/api/app.py` | **FastAPI service** — loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`. | `python scripts/run_server.py` |

---

## Repository layout

```
app.py                         # Gradio (production, HF Spaces)
app_lab.py                     # Gradio (experimental)
requirements.txt               # Spaces runtime — do NOT pin torch/torchaudio
packages.txt                   # apt deps (ffmpeg)
configs/
  base_config.yaml             # shared settings
  api_config.yaml              # FastAPI-specific
  lora_bambara.yaml            # Bambara LoRA hyperparams
  lora_fula.yaml               # Fula LoRA hyperparams
data/
  phrases/                     # RapidFuzz shortcut phrase JSONs per language
  vocabulary.jsonl             # local mirror of the HF Hub memory dataset
docs/
  roadmap_2026-04.md           # full architectural walkthrough + action plan
  baseline_rebuild.md          # parallel minimal-track plan (non-destructive)
  notebook_collaboration.md    # Kaggle push/pull workflow for contributors
  kaggle_mcp_setup.md          # optional Kaggle MCP for Claude Desktop
notebooks/
  kaggle_master_trainer/       # -> oussow/kaggle-master-trainer (LoRA fine-tune)
  train_fula_tts/              # -> oussow/sahel-voice-fula-tts-trainer (TBD)
  bootstrap_repos.ipynb
  train_colab.ipynb            # legacy Colab trainer
scripts/
  train_bambara.py             # LoRA fine-tune entrypoint (Kaggle/RunPod)
  train_fula.py                # LoRA fine-tune entrypoint (Kaggle/RunPod)
  export_onnx.py               # merge LoRA -> ONNX -> TFLite
  verify_baseline.py           # eval harness
  run_server.py                # FastAPI launcher
  run_data_pipeline.py         # dataset prep
  push_to_hf.sh                # deploy helpers
  push_to_kaggle.sh            # deploy helpers
  runpod_setup.sh
src/
  api/                         # FastAPI app, schemas, routes, middleware
  conversation/                # memory_manager, gemma_client, phrase_matcher, intent_parser
  data/                        # dataset loading + normalization (Adlam, Bambara)
  engine/                      # adapter_manager, transcriber, stt_processor, curiosity
  iot/                         # intent_parser, voice_responder, sensor_bridge
  llm/                         # LLM client wrappers
  memory/                      # vocabulary persistence
  optimization/                # ONNX / quantization helpers
  training/                    # trainer, callbacks, augmenters
  tts/                         # mms_tts, waxal_tts, f5_tts, voice_cloner
  voice/                       # speaker_profiles (ECAPA-TDNN + OpenVoice SE)
tests/                         # pytest — api, data pipeline, engine, iot
```

---

## How the memory loop works

1. Press **Push-to-Talk** → speak in Bambara, Fula, French, or English.
2. **Whisper** transcribes. If the language has a LoRA adapter loaded, `AdapterManager` hot-swaps to it (~50 ms).
3. **Qwen** reads the vocabulary it has learned so far (`MemoryManager.get_vocabulary_context()`), then returns a structured JSON reply with `intent ∈ {teaching, question, conversation, error}`.
4. If `teaching`: the word pair is appended to `data/vocabulary.jsonl` and async-pushed to `ous-sow/sahel-agri-feedback → vocabulary.jsonl`.
5. If `question`: Qwen answers using the remembered vocabulary as source of truth.
6. If `conversation`: Qwen replies naturally.
7. TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere).

The last 5 learned words are always visible in the UI.

---

## How the agricultural voice interface works

1. User asks, e.g., *"A bɛ di wa?"* ("Is it OK?") referring to their field.
2. `intent_parser.py` (keyword-based) classifies the request: `check_soil` / `check_weather` / `irrigation_status` / `pest_alert` / etc.
3. `SensorBridge` calls the configured `SENSOR_API_URL` and returns a typed `SensorData`.
4. `voice_responder.py` maps `(Intent, SensorData)` → a short (≤ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (`SOIL_MOISTURE_LOW=30`, `TEMP_HIGH=38`, pH bounds).
5. TTS speaks the reply.

---

## Environment variables

All variables have sensible defaults, so you can boot the Space without any of them — but without `HF_TOKEN` the memory loop cannot push.

### Core
| Key | Default | Purpose |
|-----|---------|---------|
| `HF_TOKEN` | — | HF write token. Required for Hub push and gated models. |
| `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` | Memory-loop target dataset. |
| `ADAPTER_REPO_ID` | `ous-sow/sahel-agri-adapters` | Published LoRA adapters. |
| `WHISPER_MODEL_ID` | `openai/whisper-large-v3-turbo` | STT base model. |
| `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` | LLM via HF Serverless. Override to any HF Serverless-supported model. |
| `LOG_LEVEL` | `INFO` | Standard Python logging level. |
| `DEVICE` | `cuda` (FastAPI) | Torch device for inference. |

### Adapters & TTS
| Key | Default |
|-----|---------|
| `BAMBARA_ADAPTER_PATH` | `./adapters/bambara` |
| `FULA_ADAPTER_PATH` | `./adapters/fula` |
| `BAMBARA_TTS_REPO` | `ynnov/ekodi-bambara-tts-female` |
| `FULA_TTS_REPO` | `ous-sow/fula-tts` |

### IoT
| Key | Default |
|-----|---------|
| `SENSOR_API_URL` | *(unset → mock sensor)* |

### Self-Teaching tab (triggers Kaggle training runs)
| Key | Default |
|-----|---------|
| `KAGGLE_USERNAME` | — |
| `KAGGLE_KEY` | — |
| `KAGGLE_KERNEL_SLUG` | `ous-sow/sahel-voice-master-trainer` *(override in prod to `oussow/kaggle-master-trainer` — the actual Kaggle owner slug)* |
| `AUTO_TRAIN_THRESHOLD` | `50` |

---

## Run locally

```bash
# Minimal baseline (what the Space runs)
pip install -r requirements.txt
python app_minimal.py

# Full production UI (not currently on the Space)
python app.py

# FastAPI service
python scripts/run_server.py

# Experimental lab UI
python app_lab.py
```

System-level dependency: **ffmpeg** (see `packages.txt`).

---

## Training

LoRA fine-tuning runs on **Kaggle T4** or **RunPod** — not locally. Pick one entrypoint:

| Target | Script | Notebook |
|--------|--------|----------|
| Bambara LoRA | `scripts/train_bambara.py` | `notebooks/kaggle_master_trainer/` |
| Fula LoRA | `scripts/train_fula.py` | `notebooks/kaggle_master_trainer/` |
| Fula TTS | — | `notebooks/train_fula_tts/` *(planned)* |

**Contributor workflow:** edit notebooks locally in `notebooks/<slug>/`, commit with `nbstripout` keeping diffs clean, then `cd notebooks/<slug> && kaggle kernels push` to run on Kaggle GPU. Full walkthrough in `docs/notebook_collaboration.md`.

`docs/kaggle_mcp_setup.md` documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM.

---

## Export for edge

```bash
python scripts/export_onnx.py   # merges LoRA into the backbone, exports ONNX
# then onnx-tf → TFLite for Android
```

ONNX does not support LoRA hot-swap, so export one file per language. `bitsandbytes` NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in `requirements.txt`).

---

## Tests

```bash
pytest tests/
```

Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder).

---

## Space secrets (HF UI → Settings → Secrets)

At minimum:

| Key | Value |
|-----|-------|
| `HF_TOKEN` | write-scope token |
| `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` |
| `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` (or any HF Serverless-supported model) |

---

## Design constraints (deliberate — do not change without discussion)

- **Adapter hot-swap** via PEFT's multi-adapter API — one backbone in VRAM, ~50 MB adapters per language, `set_adapter` ≈ 50 ms.
- **Qwen "adult-child" JSON contract** — structured `intent`/`reply`/`english`/`teaching_pair` output, parsed out of optional markdown fences.
- **JSONL + Hub push memory** — no ORM, thread-safe `MemoryManager`, async push so UI never blocks.
- **≤ 6 words per sentence** in `voice_responder.py` for clean MMS-TTS.
- **Adlam ↔ Latin dual-script** handling in `adlam.py` + `bam_normalize.py`.
- **Single-file `app.py`** — intentional for now; do not split without a plan.

---

## License

MIT.