Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

MSG commited on 18 days ago

Commit

9939b9d

1 Parent(s): 196a48f

Feat/sunday sprint 1 (#14)

* multilingual lessons

* language page wip

* language page wip lesson

* test teacher

* test teacher lessons language model

Files changed (16) hide show

.cursor/plans/multilingual_coach_cohere_eed97371.plan.md +274 -0
.env.example +9 -3
README.md +3 -3
USAGE.md +37 -6
apps/gradio-space/README.md +18 -9
apps/gradio-space/src/gradio_space/api/studio.py +223 -10
apps/gradio-space/static/studio/index.html +67 -88
apps/gradio-space/static/studio/studio.css +116 -47
apps/gradio-space/static/studio/studio.js +299 -218
libs/echocoach/src/echocoach/config.py +22 -0
libs/echocoach/src/echocoach/pipeline.py +7 -2
libs/echocoach/src/echocoach/prompts.py +63 -5
libs/echocoach/src/echocoach/teacher_voice.py +31 -12
libs/echocoach/tests/test_teacher_voice.py +37 -2
models.yaml +24 -0
voice_models.yaml +5 -3

.cursor/plans/multilingual_coach_cohere_eed97371.plan.md ADDED Viewed

	@@ -0,0 +1,274 @@

+---
+name: Multilingual Coach Cohere
+overview: Add a dedicated Studio tab — Language lessons — that unifies multilingual text chat, audio upload, and realtime-style voice in/out (Cohere Transcribe + Tiny Aya + streaming TTS) on one page, replacing the split Voice / pitch-analysis UX for the hackathon demo.
+todos:
+  - id: aya-presets
+    content: Add tiny-aya-global/water/fire/earth to models.yaml; set voice_models.yaml coach_model default; verify TransformersBackend.chat()
+    status: completed
+  - id: locale-prompts
+    content: Add language-lesson system prompt + language_instruction() for lesson/explain modes; wire language into build_teacher_messages() and RAG path
+    status: completed
+  - id: language-lessons-page
+    content: "New Studio nav tab Language lessons: language selector, unified composer (text + mic + upload), chat with inline audio, auto VoiceOut via realtime TTS"
+    status: completed
+  - id: language-lessons-api
+    content: Extend teacher_voice_* API with auto_voiceout flag; reuse existing turn pipeline; optional speak-on-reply default for Language lessons view
+    status: completed
+  - id: cohere-space-defaults
+    content: "Document and set Space secrets: ECHOCOACH_ASR_PRESET=cohere-transcribe, ECHOCOACH_COACH_MODEL=tiny-aya-global, ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b"
+    status: completed
+  - id: echocoach-i18n-polish
+    content: Move Deep pitch analysis to collapsed Advanced or Classic-only; gate English-only filler metrics; fix el Piper voice mapping
+    status: completed
+  - id: demo-docs
+    content: "Update README judge script: single Language lessons tab demo (14-lang voice + 70-lang text); Cohere Labs partner narrative"
+    status: completed
+isProject: false
+---
+# Language lessons — one tab, text + audio + realtime voice (Cohere stack)
+## Goal
+Replace the current split **Voice** experience (TeacherVoice chat + buried EchoCoach pitch panel) with **one primary Studio page: Language lessons** — a multilingual learning coach where the user can interact the same way throughout:
+| Input | Output |
+|-------|--------|
+| **Text** — type a question or lesson prompt | **Text** — chat bubbles in target language |
+| **Mic** — hold / push-to-talk recording | **Audio** — auto-play teacher reply (realtime TTS when available) |
+| **Upload** — `.wav` / `.mp3` clip | **Optional** — replay last reply, toggle auto-speak |
+Backend stays **turn-based** (speak → wait → hear reply), but the page should *feel* realtime: mic stops → transcript appears → first audio chunk plays quickly via VibeVoice Realtime, with Piper fallback.
+Partner stack ([Cohere Labs guide](https://build-small-hackathon-field-guide.hf.space/partners/cohere)): **Cohere Transcribe** (speech in) + **Tiny Aya** (coach brain, 70 langs) + **Piper / VibeVoice** (speech out).
+---
+## What you already have (reuse, don’t rewrite)
+| Building block | Location | Reuse for Language lessons |
+|---|---|---|
+| Multi-turn coach pipeline | [`libs/echocoach/src/echocoach/teacher_voice.py`](libs/echocoach/src/echocoach/teacher_voice.py) | Same `run_teacher_voice_turn` / `run_teacher_voice_text_turn` |
+| Lesson + explain prompts | [`libs/echocoach/src/echocoach/prompts.py`](libs/echocoach/src/echocoach/prompts.py) | `lesson` + `explain` modes (drop pitch from this page) |
+| 14-language ASR/TTS config | [`voice_models.yaml`](voice_models.yaml) | Language dropdown + Cohere ASR + Piper voices |
+| Cohere Transcribe backend | [`libs/echocoach/src/echocoach/asr/cohere.py`](libs/echocoach/src/echocoach/asr/cohere.py) | Default ASR on Space |
+| Streaming TTS | [`libs/echocoach/src/echocoach/tts/vibevoice.py`](libs/echocoach/src/echocoach/tts/vibevoice.py) + `voiceout.py` | `chunk_first=True` already used for TeacherVoice |
+| Studio API | [`apps/gradio-space/src/gradio_space/api/studio.py`](apps/gradio-space/src/gradio_space/api/studio.py) | `teacher_voice_turn`, `teacher_voice_audio_turn`, `voice_presets` |
+| RAG grounding | ResearchMind via `teacher_voice.py` | Optional “Answer from my sources” toggle |
+| Recording helpers | [`studio.js`](apps/gradio-space/static/studio/studio.js) `recordingTarget`, mic start/stop | Extend for hold-to-talk on Language lessons page |
+**Not on this page:** EchoCoach one-shot pitch JSON report → move to **Classic** `/classic` EchoCoach tab only, or a collapsed “Pitch analysis (advanced)” link so Language lessons stays focused on learning.
+---
+## Page design — Language lessons tab
+### Navigation
+In [`apps/gradio-space/static/studio/index.html`](apps/gradio-space/static/studio/index.html):
+- Add/rename sidebar item: **`Language lessons`** (`data-view="language-lessons"`) with icon `translate` or `school`.
+- Demote current **Voice** nav (pitch + mixed modes) → remove from primary nav, or keep **Voice** as alias redirecting to Language lessons for one release cycle.
+- Classic `/classic` keeps full TeacherVoice + EchoCoach tabs unchanged.
+### Layout (single page)
+```text
+┌─��───────────────────────────────────────────────────────────┐
+│ Language lessons                                             │
+│ Learn in your language — text, voice, or upload audio        │
+├──────────────┬──────────────────────────────────────────────┤
+│ LEFT RAIL    │ MAIN — conversation                          │
+│              │                                              │
+│ Target lang ▼│  [User bubble — text or transcript]         │
+│ Coach model  │  [Teacher bubble — text + inline ▶ audio]   │
+│  (Aya Global)│  ...                                         │
+│              │                                              │
+│ Lesson topic │ ── UNIFIED COMPOSER ──────────────────────  │
+│              │ [ Text area — always visible ]               │
+│ ☑ Use sources│ [ 🎤 Hold to speak ] [ 📎 Upload audio ]     │
+│              │ [ Send ]  ☑ Auto-speak replies               │
+│ Add sources  │ Status: Listening… / Transcribing… / …       │
+│ (details)    │                                              │
+└──────────────┴──────────────────────────────────────────────┘
+```
+**Left rail controls**
+- **Target language** — required; populated from `voice_presets.languages` (14 voice langs).
+- **Coach variant** (optional Advanced): Auto regional → Tiny Aya Global / Water / Fire / Earth.
+- **Lesson topic** — defaults to workspace topic; grounds lesson mode.
+- **Use indexed sources** — same as current `#use-rag`; applies to explain + lesson.
+- **Add sources** — reuse voice-rail ingest (discover, URLs, PDF) or link to Research view.
+**Main conversation**
+- Messages format: user shows typed text or “🎤 transcript”; assistant shows reply text + embedded `<audio controls autoplay>` when VoiceOut path returned.
+- Empty state copy: “Choose a language, then type, speak, or upload audio to start your lesson.”
+**Unified composer (one place for all input modes)**
+1. **Text** — textarea + **Send** → `teacher_voice_turn` with `mode=lesson` (default) or toggle **Explain** vs **Lesson coach** (two small pills, not three modes).
+2. **Mic** — **Hold to speak** (mousedown/touchstart → record, release → stop → auto `teacher_voice_audio_turn`). Reuse existing `recordingTarget` pattern; set `state.recordingTarget = "language-lessons"`.
+3. **Upload** — file input → preview waveform/name → **Send audio** or auto-send on select.
+4. **Auto-speak replies** — checkbox default **on**; passes through to API so server always synthesizes TTS (already default in pipeline when `synthesize_voice_reply` runs).
+**Realtime voice output behavior**
+- Use `ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b` for Language lessons page (14 langs experimental on VibeVoice; fallback to Piper per lang).
+- Frontend: on response, `autoplay` first audio element; show “Speaking…” while playing.
+- Honest scope: **not** full-duplex WebSocket; latency target is “release mic → hear teacher within ~1–3s on GPU” via chunked TTS already in `voiceout.py`.
+**70-language text demo (no voice required)**
+- Language dropdown includes **“Other (text only)”** free-text ISO/code field OR a second “LLM language” field for codes outside Piper set (e.g. `hi`, `sw`).
+- Helper: “Voice in/out: 14 languages · Coach understands 70+ with Tiny Aya.”
+- When language has no Piper voice, show text reply only + banner “VoiceOut not available for this language.”
+---
+## Target architecture
+```mermaid
+flowchart TB
+  subgraph page [Language lessons page]
+    TextIn[Text composer]
+    MicIn[Hold-to-talk mic]
+    FileIn[Audio upload]
+  end
+  TextIn --> Turn[teacher_voice turn]
+  MicIn --> ASR[Cohere Transcribe 2B]
+  FileIn --> ASR
+  ASR --> Turn
+  Turn --> Aya[Tiny Aya Global or regional]
+  RAG[ResearchMind RAG] --> Aya
+  Aya --> Reply[Lesson reply text]
+  Reply --> TTS[VibeVoice Realtime or Piper]
+  TTS --> AutoPlay[Inline autoplay audio]
+  Reply --> Chat[Chat bubbles]
+```
+---
+## Gaps to close (updated)
+1. **No dedicated Language lessons view** — today everything lives under generic **Voice** with pitch mode + EchoCoach panel ([`index.html` L303–419](apps/gradio-space/static/studio/index.html)).
+2. **Language not wired in Studio JS** — hardcoded `default_language` in [`studio.js`](apps/gradio-space/static/studio/studio.js) (~L1187).
+3. **Split send paths** — “Send text” vs “Send voice turn” should become one flow with auto-routing by input type.
+4. **Manual replay buttons** — “Speak full reply” should be default-on for Language lessons; keep replay as secondary.
+5. **Coach LLM** — still MiniCPM5 1B; need Tiny Aya presets for multilingual quality.
+6. **Default ASR** — Whisper tiny, not Cohere Transcribe.
+7. **Pitch/EchoCoach clutter** — remove from primary Language lessons UX.
+---
+## Implementation plan
+### 1. Backend — Tiny Aya + locale prompts (unchanged core)
+Add to [`models.yaml`](models.yaml):
+| Preset | HF model_id |
+|--------|-------------|
+| `tiny-aya-global` | `CohereLabs/tiny-aya-global` |
+| `tiny-aya-water` | `CohereLabs/tiny-aya-water` |
+| `tiny-aya-fire` | `CohereLabs/tiny-aya-fire` |
+| `tiny-aya-earth` | `CohereLabs/tiny-aya-earth` |
+Set `voice_models.yaml` → `defaults.coach_model: tiny-aya-global`.
+In [`prompts.py`](libs/echocoach/src/echocoach/prompts.py):
+- Add `LANGUAGE_LESSON_SYSTEM` (or extend `LESSON_SYSTEM` / `EXPLAIN_SYSTEM`) with explicit target-language instruction.
+- Add `language_instruction(language: str) -> str` injected in `build_teacher_messages()`.
+Optional `resolve_aya_preset(language)` for Water/Fire/Earth when user picks “Auto regional”.
+### 2. Backend — Language lessons API surface
+In [`studio.py`](apps/gradio-space/src/gradio_space/api/studio.py):
+- Add thin wrapper `api_language_lesson_turn(...)` OR alias existing endpoints with fixed `mode` default `lesson`.
+- Parameters: `message`, `audio_path`, `language`, `topic`, `use_rag`, `history`, `mode` (`lesson`|`explain`), `auto_voiceout=True`, `coach_model` optional override.
+- Ensure `language` is always passed through to ASR + TTS + prompts (no default-only path from frontend).
+Register in Studio HTML boot (`initLanguageLessons()` parallel to `initVoicePresets()`).
+### 3. Frontend — new Language lessons page
+Files: [`studio_html.py`](apps/gradio-space/src/gradio_space/ui/studio_html.py) (fragment), [`index.html`](apps/gradio-space/static/studio/index.html), [`studio.js`](apps/gradio-space/static/studio/studio.js), [`studio.css`](apps/gradio-space/static/studio/studio.css).
+- New `<section class="col col-studio" data-view-panel="language-lessons">` with layout above.
+- JS module: `state.languageLesson = { language, mode, autoSpeak, history }`.
+- Wire nav `data-view="language-lessons"` in existing view switcher.
+- **Hold-to-talk**: pointerdown on `#btn-lesson-hold-mic` → start recording; pointerup → stop → `sendLanguageLessonAudioTurn(path)`.
+- **Unified send**: if textarea non-empty → text turn; else if pending audio → audio turn.
+- **Render**: extend chat renderer to show inline audio on assistant messages (reuse `renderVoiceReply` patterns).
+- Remove pitch mode cards and `#voice-pitch-analysis` from this view (Classic EchoCoach tab remains).
+### 4. Space defaults (Cohere partner demo)
+```bash
+ECHOCOACH_ASR_PRESET=cohere-transcribe
+ECHOCOACH_COACH_MODEL=tiny-aya-global
+ECHOCOACH_TTS_PRESET=piper-multilingual
+ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
+```
+Document in [`USAGE.md`](USAGE.md). GPU Space recommended.
+### 5. Polish & demote pitch analysis
+- Gate English-only filler metrics in EchoCoach when `language != "en"`.
+- Fix Greek Piper mapping (`el`) in `voice_models.yaml`.
+- EchoCoach deep analysis: Classic tab only, or footer link “Practice a monologue (pitch metrics)” opening Classic.
+### 6. Demo script (single tab)
+Update [`README.md`](README.md) / [`apps/gradio-space/README.md`](apps/gradio-space/README.md):
+1. Open **Language lessons**.
+2. Select **French** → hold mic → ask “Explique le fine-tuning en termes simples.” → hear Piper/VibeVoice reply.
+3. Switch to **Spanish**, type a follow-up question (text in, text + audio out).
+4. Select **Hindi** (text-only) → show Tiny Aya Fire-quality written lesson snippet.
+5. Toggle **Use sources** after ingesting one PDF in Research.
+Badge line: **Cohere Labs** — Transcribe + Tiny Aya on one local Language lessons page.
+### 7. Tests
+[`libs/echocoach/tests/test_teacher_voice.py`](libs/echocoach/tests/test_teacher_voice.py):
+- `build_teacher_messages(..., language="fr")` contains French instruction.
+- Optional: API contract test that `language` propagates to mock ASR call.
+---
+## What you do **not** need for hackathon MVP
+- Full duplex / interruptible WebSocket conversation
+- TTS for all 70 Tiny Aya languages
+- Replacing ResearchMind embeddings with multilingual models
+- Keeping pitch practice on the same page as Language lessons
+---
+## Risk notes
+| Risk | Mitigation |
+|------|------------|
+| GPU RAM (Transcribe 2B + Aya 3.3B) | Sequential load on ZeroGPU; dev fallback whisper + Aya |
+| VibeVoice lang coverage gaps | Piper fallback per `voice_models.yaml`; text-only banner |
+| Hold-to-talk on mobile browsers | Push-to-talk fallback buttons (start/stop) |
+| Scope creep from 3-mode Voice tab | Language lessons = **lesson + explain only** |
+---
+## Suggested execution order
+1. Tiny Aya presets + locale prompts (quality foundation)
+2. **Language lessons page** HTML/JS/CSS + unified composer
+3. Wire language + auto_voiceout through API
+4. Space env defaults (Cohere ASR + realtime TTS)
+5. Demote EchoCoach pitch from Studio; docs + demo script

.env.example CHANGED Viewed

@@ -52,11 +52,17 @@ ALLOW_MODEL_SWITCH=false
 # After training, point Gradio at the adapter preset:
 # ACTIVE_MODEL=minicpm5-1b-lesson-lora
-# --- EchoCoach (voice practice coach) ---
 # VOICE_PRESETS_PATH=./voice_models.yaml
-# ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
 # ECHOCOACH_TTS_PRESET=piper-multilingual
-# ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b   # TeacherVoice VoiceOut (falls back to Piper)
 # ECHOCOACH_COACH_MODEL=minicpm5-1b
 # ECHOCOACH_MAX_SECONDS=30
 # ECHOCOACH_CAPTURE_DEVICE=   # optional ALSA/PipeWire device (e.g. pipewire, alsa_input.pci-...)

 # After training, point Gradio at the adapter preset:
 # ACTIVE_MODEL=minicpm5-1b-lesson-lora
+# --- EchoCoach / Language lessons (voice stack) ---
 # VOICE_PRESETS_PATH=./voice_models.yaml
+# Recommended for Cohere Labs partner demo (GPU Space):
+# ECHOCOACH_ASR_PRESET=cohere-transcribe
+# ECHOCOACH_COACH_MODEL=tiny-aya-global
+# Comma-separated preset keys from models.yaml if primary coach fails to load:
+# ECHOCOACH_COACH_FALLBACK=minicpm5-1b
 # ECHOCOACH_TTS_PRESET=piper-multilingual
+# ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
+# Dev fallback (CPU):
+# ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
 # ECHOCOACH_COACH_MODEL=minicpm5-1b
 # ECHOCOACH_MAX_SECONDS=30
 # ECHOCOACH_CAPTURE_DEVICE=   # optional ALSA/PipeWire device (e.g. pipewire, alsa_input.pci-...)

README.md CHANGED Viewed

@@ -38,10 +38,10 @@ Open [http://localhost:7860](http://localhost:7860).
 ### Studio UI (Off Brand track)
-The default landing page is a **custom AI Studio workspace** at `/` — not default Gradio chrome. It uses **Gradio 6 Server mode** (`gradio.Server`): Material 3 layout, sidebar + three-column workspace (Research → Slides → Voice/Coach), and `@server.api` endpoints wired to the same Python backends as Classic.
-- **`/`** — Studio UI (ingest sources, generate slides, TeacherVoice, EchoCoach)
-- **`/classic`** — full Gradio Blocks app (all tabs, settings, Chat debug)
 See [apps/gradio-space/README.md](apps/gradio-space/README.md) for API names and a 2-minute judge demo script.

 ### Studio UI (Off Brand track)
+The default landing page is a **custom AI Studio workspace** at `/` — not default Gradio chrome. It uses **Gradio 6 Server mode** (`gradio.Server`): Material 3 layout, sidebar + workspace (Research → Slides → Language lessons), and `@server.api` endpoints wired to the same Python backends as Classic.
+- **`/`** — Studio UI (ingest sources, generate slides, **Language lessons** multilingual coach)
+- **`/classic`** — full Gradio Blocks app (TeacherVoice, EchoCoach pitch analysis, settings, Chat debug)
 See [apps/gradio-space/README.md](apps/gradio-space/README.md) for API names and a 2-minute judge demo script.

USAGE.md CHANGED Viewed

@@ -2,7 +2,7 @@
 How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
-The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **TeacherVoice** for spoken back-and-forth tutoring, **EchoCoach** for one-shot pitch analysis, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
 ## Prerequisites
@@ -115,10 +115,11 @@ Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:
 | Variable | Default | Description |
 | -------- | ------- | ----------- |
-| `ECHOCOACH_ASR_PRESET` | `whisper-cpp-tiny` | ASR preset key |
 | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
-| `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | TeacherVoice streaming TTS (see below) |
-| `ECHOCOACH_COACH_MODEL` | `minicpm5-1b` | Text coach preset (from `models.yaml`) |
 | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
 **Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face — run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.
@@ -129,9 +130,39 @@ Smoke tests (analysis only, no GPU):
 bash scripts/echo_coach_smoke.sh
 ```
-### TeacherVoice — spoken conversation (turn-based)
-The **TeacherVoice** tab is a **multi-turn voice teacher** — not full duplex like a phone call, but speak → wait → hear a reply → repeat.
 | Mode | Purpose |
 | ---- | ------- |

 How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
+The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (Cohere Transcribe + Tiny Aya), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
 ## Prerequisites
 | Variable | Default | Description |
 | -------- | ------- | ----------- |
+| `ECHOCOACH_ASR_PRESET` | `cohere-transcribe` | ASR preset key (Space demo); use `whisper-cpp-tiny` on CPU dev |
 | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
+| `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
+| `ECHOCOACH_COACH_MODEL` | `tiny-aya-global` | Text coach preset (Tiny Aya; from `models.yaml`) |
+| `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
 | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
 **Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face — run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.
 bash scripts/echo_coach_smoke.sh
 ```
+### Language lessons — multilingual coach (Studio tab)
+The **Language lessons** tab is the primary voice learning experience: one page for **text**, **hold-to-talk mic**, and **audio upload**, with optional auto VoiceOut on every reply.
+| Input | Output |
+| ----- | ------ |
+| Type a question | Chat bubble in target language |
+| Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
+| **Other (text only)** language code | Tiny Aya written lesson (no Piper voice for unsupported codes) |
+**Stack (Cohere Labs partner demo):** [Cohere Transcribe](https://huggingface.co/CohereLabs/c4ai-transcribe-v2) (14 voice langs) → [Tiny Aya Global / regional](https://huggingface.co/CohereLabs/tiny-aya-global) (70+ text langs) → Piper or VibeVoice Realtime for speech out.
+Set Space secrets (GPU recommended):
+```bash
+ECHOCOACH_ASR_PRESET=cohere-transcribe
+ECHOCOACH_COACH_MODEL=tiny-aya-global
+ECHOCOACH_TTS_PRESET=piper-multilingual
+ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
+```
+| Mode | Purpose |
+| ---- | ------- |
+| **Explain** | Tutor any topic in plain language |
+| **Lesson coach** | Discuss and outline lesson content |
+Turn-based (not full duplex): speak → wait → hear reply. **Auto-speak replies** synthesizes TTS each turn when the language has a Piper voice.
+Pitch metrics and monologue analysis live in **Classic UI → EchoCoach** (`/classic`).
+### TeacherVoice — Classic UI (turn-based)
+The **TeacherVoice** tab in `/classic` is the legacy multi-turn voice teacher — same pipeline as Language lessons, plus **Pitch practice** mode.
 | Mode | Purpose |
 | ---- | ------- |

apps/gradio-space/README.md CHANGED Viewed

@@ -33,8 +33,9 @@ This package uses **Gradio 6 Server mode** (`gradio.Server`):
 **Voice & coach**
 - `teacher_voice_turn`, `teacher_voice_audio_turn`, `teacher_voice_clear`, `teacher_voice_speak`
-- `load_sample_pitch`, `analyze_pitch` (language, ASR preset, `speak_rewrite`)
 - `recording_status`, `recording_start`, `recording_stop`
 - `voice_presets`
@@ -44,15 +45,23 @@ This package uses **Gradio 6 Server mode** (`gradio.Server`):
 - `debug_chat`
 - `save_upload`
-## Demo script (judges)
 1. Open `/` — **Small Model Finetuning** project workspace
-2. Paste a URL in Research → **Ingest URL** → documents appear with **RAG Active**
-3. Center column → **Generate Slides** → slide preview canvas, thumbnail strip, and **Outline** panel
-4. Optional: expand **Research sources** → Web search or RAG modes
-5. Voice view → text or **mic** → full conversation thread + **Speak full reply**
-6. Coach view → **Load sample clip** or record → **Analyze pitch** (charts, transcript, VoiceOut)
-7. Debug sidebar → RAG scope overrides, plain chat or corpus-grounded test with traces
-8. Settings drawer → model status / reload (Classic at `/classic` still available)
 Space card metadata lives in the [repository root README.md](../../README.md).

 **Voice & coach**
+- `language_lesson_turn` — unified text/audio turn for Language lessons (mode, language, `auto_voiceout`, coach variant)
 - `teacher_voice_turn`, `teacher_voice_audio_turn`, `teacher_voice_clear`, `teacher_voice_speak`
+- `load_sample_pitch`, `analyze_pitch` (Classic EchoCoach; language, ASR preset, `speak_rewrite`)
 - `recording_status`, `recording_start`, `recording_stop`
 - `voice_presets`
 - `debug_chat`
 - `save_upload`
+## Demo script (judges) — Language lessons + Cohere stack
+**Badge line:** Cohere Labs — Transcribe + Tiny Aya on one local Language lessons page.
 1. Open `/` — **Small Model Finetuning** project workspace
+2. **Language lessons** tab → select **French** → hold mic → ask *« Explique le fine-tuning en termes simples. »* → hear Piper/VibeVoice reply
+3. Switch to **Spanish**, type a follow-up (text in, text + audio out with **Auto-speak replies** on)
+4. Select **Other (text only)** → enter `hi` → show Tiny Aya Fire-quality written lesson (text only banner)
+5. Toggle **Use indexed sources** after ingesting one PDF in **Research**
+6. Optional: **Generate Slides** from the Slides tab; **Classic UI** (`/classic`) for EchoCoach pitch metrics
+Space secrets for GPU demo:
+```bash
+ECHOCOACH_ASR_PRESET=cohere-transcribe
+ECHOCOACH_COACH_MODEL=tiny-aya-global
+ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
+```
 Space card metadata lives in the [repository root README.md](../../README.md).

apps/gradio-space/src/gradio_space/api/studio.py CHANGED Viewed

@@ -10,7 +10,7 @@ import gradio as gr
 from echocoach.config import get_echo_coach_config
 from echocoach.pipeline import run_echo_coach
-from echocoach.prompts import TeacherVoiceMode
 from echocoach.recording import (
     ServerRecordingError,
     recording_backend_status,
@@ -51,7 +51,7 @@ from gradio_space.ui.studio_html import (
     render_trace_details,
 )
 from gradio_space.voice_helpers import speak_last_assistant_reply
-from inference.config import get_app_config
 from inference.factory import get_backend
 from researchmind.config import get_config as get_research_config
 from researchmind.ingest import IngestPipeline
@@ -167,11 +167,93 @@ def _voice_stack_summary() -> str:
         f"ASR: {asr.label} ({_echo_config.asr_preset})",
         f"TTS: {tts.label} ({_echo_config.tts_preset})",
         f"Coach model: {_echo_config.coach_model}",
         f"Max recording: {_echo_config.max_seconds}s",
     ]
     return "\n".join(lines)
 def _paths_summary() -> str:
     rm = get_research_config()
     lines = []
@@ -549,9 +631,15 @@ def api_teacher_voice_turn(
     doc_ids: list[str] | None = None,
     language: str = "en",
     asr_preset: str | None = None,
 ) -> dict[str, Any]:
-    model_key = get_active_model_key()
-    load_error = ensure_model_loaded(model_key)
     if load_error:
         return err(load_error)
@@ -567,9 +655,11 @@ def api_teacher_voice_turn(
             language=language,
             topic=topic.strip() or None,
             backend=get_backend(model_key),
             use_rag=use_rag and mode in RAG_MODES,
             session_id=session_id or None,
             doc_ids=doc_ids or None,
         )
     except Exception as exc:  # noqa: BLE001
         return err(str(exc))
@@ -577,9 +667,12 @@ def api_teacher_voice_turn(
     return ok(
         history=result.history,
         assistant=result.assistant_text,
-        status=result.rag_status or "Turn complete.",
         voiceout_path=result.voiceout_path,
         rag_references=result.rag_references,
     )
 def api_teacher_voice_audio_turn(
@@ -592,9 +685,15 @@ def api_teacher_voice_audio_turn(
     doc_ids: list[str] | None = None,
     language: str = "en",
     asr_preset: str | None = None,
 ) -> dict[str, Any]:
-    model_key = get_active_model_key()
-    load_error = ensure_model_loaded(model_key)
     if load_error:
         return err(load_error)
@@ -613,10 +712,12 @@ def api_teacher_voice_audio_turn(
             asr_preset=preset,
             topic=topic.strip() or None,
             backend=get_backend(model_key),
             use_rag=use_rag and mode in RAG_MODES,
             session_id=session_id or None,
             doc_ids=doc_ids or None,
             max_turn_seconds=max_turn,
         )
     except Exception as exc:  # noqa: BLE001
         return err(str(exc))
@@ -624,10 +725,60 @@ def api_teacher_voice_audio_turn(
     return ok(
         history=result.history,
         assistant=result.assistant_text,
-        status=result.rag_status or "Turn complete.",
         voiceout_path=result.voiceout_path,
         user_text=result.user_text,
         rag_references=result.rag_references,
     )
@@ -672,8 +823,7 @@ def api_analyze_pitch(
     asr_preset: str | None = None,
     speak_rewrite: bool = False,
 ) -> dict[str, Any]:
-    model_key = get_active_model_key()
-    load_error = ensure_model_loaded(model_key)
     if load_error:
         return err(load_error)
@@ -686,6 +836,7 @@ def api_analyze_pitch(
             audio_path,
             language=language,
             asr_preset=preset,
             backend=get_backend(model_key),
             speak_rewrite=speak_rewrite,
         )
@@ -786,12 +937,30 @@ def api_recording_stop() -> dict[str, Any]:
 def api_voice_presets() -> dict[str, Any]:
     return ok(
         languages=[{"label": label, "value": value} for label, value in _echo_config.language_choices()],
         asr_presets=[{"label": label, "value": value} for label, value in _echo_config.asr_choices()],
         default_language=_echo_config.language_choices()[0][1] if _echo_config.language_choices() else "en",
         default_asr=_echo_config.asr_preset,
         max_seconds=_echo_config.max_seconds,
     )
@@ -917,6 +1086,38 @@ def register_studio_apis(server: gr.Server) -> None:
             file_paths,
         )
     @server.api(name="teacher_voice_turn")
     def _teacher_voice_turn(
         message: str,
@@ -928,6 +1129,9 @@ def register_studio_apis(server: gr.Server) -> None:
         doc_ids: list[str] | None = None,
         language: str = "en",
         asr_preset: str | None = None,
     ) -> dict[str, Any]:
         return api_teacher_voice_turn(
             message,
@@ -939,6 +1143,9 @@ def register_studio_apis(server: gr.Server) -> None:
             doc_ids,
             language,
             asr_preset,
         )
     @server.api(name="teacher_voice_audio_turn")
@@ -952,6 +1159,9 @@ def register_studio_apis(server: gr.Server) -> None:
         doc_ids: list[str] | None = None,
         language: str = "en",
         asr_preset: str | None = None,
     ) -> dict[str, Any]:
         return api_teacher_voice_audio_turn(
             audio_path,
@@ -963,6 +1173,9 @@ def register_studio_apis(server: gr.Server) -> None:
             doc_ids,
             language,
             asr_preset,
         )
     @server.api(name="teacher_voice_clear")

 from echocoach.config import get_echo_coach_config
 from echocoach.pipeline import run_echo_coach
+from echocoach.prompts import TeacherVoiceMode, resolve_aya_preset
 from echocoach.recording import (
     ServerRecordingError,
     recording_backend_status,
     render_trace_details,
 )
 from gradio_space.voice_helpers import speak_last_assistant_reply
+from inference.config import get_app_config, get_model_config
 from inference.factory import get_backend
 from researchmind.config import get_config as get_research_config
 from researchmind.ingest import IngestPipeline
         f"ASR: {asr.label} ({_echo_config.asr_preset})",
         f"TTS: {tts.label} ({_echo_config.tts_preset})",
         f"Coach model: {_echo_config.coach_model}",
+        f"Coach fallbacks: {', '.join(_echo_config.coach_fallbacks) or 'none'}",
         f"Max recording: {_echo_config.max_seconds}s",
     ]
     return "\n".join(lines)
+def _coach_model_key(
+    coach_model: str | None = None,
+    *,
+    language: str = "en",
+    coach_variant: str = "auto",
+) -> str:
+    if coach_model and coach_model.strip():
+        key = coach_model.strip()
+    elif coach_variant and coach_variant not in ("auto", ""):
+        key = coach_variant.strip()
+    else:
+        key = resolve_aya_preset(language, coach_variant)
+    if key in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth", "auto"):
+        key = "tiny-aya-global"
+    return key
+def _coach_model_label(model_key: str) -> str:
+    try:
+        return get_model_config(model_key).label
+    except Exception:
+        return model_key
+def _coach_model_candidates(
+    coach_model: str | None = None,
+    *,
+    language: str = "en",
+    coach_variant: str = "auto",
+) -> list[str]:
+    if coach_model and coach_model.strip():
+        return [coach_model.strip()]
+    primary = _coach_model_key(None, language=language, coach_variant=coach_variant)
+    chain: list[str] = []
+    seen: set[str] = set()
+    for key in (primary, *_echo_config.coach_fallbacks):
+        if key and key not in seen:
+            seen.add(key)
+            chain.append(key)
+    return chain or [primary]
+def _ensure_coach_loaded(
+    coach_model: str | None = None,
+    *,
+    language: str = "en",
+    coach_variant: str = "auto",
+) -> tuple[str, str | None, str | None]:
+    """Load the first coach preset that succeeds. Returns (key, error, fallback_note)."""
+    candidates = _coach_model_candidates(
+        coach_model,
+        language=language,
+        coach_variant=coach_variant,
+    )
+    errors: list[str] = []
+    for index, key in enumerate(candidates):
+        load_error = ensure_model_loaded(key)
+        if not load_error:
+            if index == 0:
+                return key, None, None
+            label = _coach_model_label(key)
+            note = (
+                f"Primary coach unavailable — using fallback **{label}** (`{key}`). "
+                "Replies still follow your target language via prompts."
+            )
+            return key, None, note
+        errors.append(load_error)
+    return candidates[-1], errors[-1], None
+def _coach_turn_status(base: str | None, fallback_note: str | None) -> str:
+    status = (base or "Turn complete.").strip()
+    if fallback_note:
+        return f"{fallback_note} {status}".strip()
+    return status
+def _voice_language_codes() -> list[str]:
+    return [code for _, code in _echo_config.language_choices()]
 def _paths_summary() -> str:
     rm = get_research_config()
     lines = []
     doc_ids: list[str] | None = None,
     language: str = "en",
     asr_preset: str | None = None,
+    auto_voiceout: bool = True,
+    coach_model: str = "",
+    coach_variant: str = "auto",
 ) -> dict[str, Any]:
+    model_key, load_error, fallback_note = _ensure_coach_loaded(
+        coach_model or None,
+        language=language,
+        coach_variant=coach_variant,
+    )
     if load_error:
         return err(load_error)
             language=language,
             topic=topic.strip() or None,
             backend=get_backend(model_key),
+            coach_model=model_key,
             use_rag=use_rag and mode in RAG_MODES,
             session_id=session_id or None,
             doc_ids=doc_ids or None,
+            auto_voiceout=auto_voiceout,
         )
     except Exception as exc:  # noqa: BLE001
         return err(str(exc))
     return ok(
         history=result.history,
         assistant=result.assistant_text,
+        status=_coach_turn_status(result.rag_status, fallback_note),
         voiceout_path=result.voiceout_path,
+        voiceout_warning=result.voiceout_warning,
         rag_references=result.rag_references,
+        coach_model=model_key,
+        coach_fallback=bool(fallback_note),
     )
 def api_teacher_voice_audio_turn(
     doc_ids: list[str] | None = None,
     language: str = "en",
     asr_preset: str | None = None,
+    auto_voiceout: bool = True,
+    coach_model: str = "",
+    coach_variant: str = "auto",
 ) -> dict[str, Any]:
+    model_key, load_error, fallback_note = _ensure_coach_loaded(
+        coach_model or None,
+        language=language,
+        coach_variant=coach_variant,
+    )
     if load_error:
         return err(load_error)
             asr_preset=preset,
             topic=topic.strip() or None,
             backend=get_backend(model_key),
+            coach_model=model_key,
             use_rag=use_rag and mode in RAG_MODES,
             session_id=session_id or None,
             doc_ids=doc_ids or None,
             max_turn_seconds=max_turn,
+            auto_voiceout=auto_voiceout,
         )
     except Exception as exc:  # noqa: BLE001
         return err(str(exc))
     return ok(
         history=result.history,
         assistant=result.assistant_text,
+        status=_coach_turn_status(result.rag_status, fallback_note),
         voiceout_path=result.voiceout_path,
+        voiceout_warning=result.voiceout_warning,
         user_text=result.user_text,
         rag_references=result.rag_references,
+        coach_model=model_key,
+        coach_fallback=bool(fallback_note),
+    )
+def api_language_lesson_turn(
+    message: str = "",
+    audio_path: str = "",
+    mode: TeacherVoiceMode = "lesson",
+    topic: str = "",
+    session_id: str = "",
+    use_rag: bool = True,
+    history: list | None = None,
+    doc_ids: list[str] | None = None,
+    language: str = "en",
+    asr_preset: str | None = None,
+    auto_voiceout: bool = True,
+    coach_model: str = "",
+    coach_variant: str = "auto",
+) -> dict[str, Any]:
+    """Unified Language lessons turn — routes to text or audio pipeline."""
+    if audio_path and audio_path.strip():
+        return api_teacher_voice_audio_turn(
+            audio_path.strip(),
+            mode=mode,
+            topic=topic,
+            session_id=session_id,
+            use_rag=use_rag,
+            history=history,
+            doc_ids=doc_ids,
+            language=language,
+            asr_preset=asr_preset,
+            auto_voiceout=auto_voiceout,
+            coach_model=coach_model,
+            coach_variant=coach_variant,
+        )
+    return api_teacher_voice_turn(
+        message,
+        mode=mode,
+        topic=topic,
+        session_id=session_id,
+        use_rag=use_rag,
+        history=history,
+        doc_ids=doc_ids,
+        language=language,
+        asr_preset=asr_preset,
+        auto_voiceout=auto_voiceout,
+        coach_model=coach_model,
+        coach_variant=coach_variant,
     )
     asr_preset: str | None = None,
     speak_rewrite: bool = False,
 ) -> dict[str, Any]:
+    model_key, load_error, _fallback_note = _ensure_coach_loaded(None, language=language)
     if load_error:
         return err(load_error)
             audio_path,
             language=language,
             asr_preset=preset,
+            coach_model=model_key,
             backend=get_backend(model_key),
             speak_rewrite=speak_rewrite,
         )
 def api_voice_presets() -> dict[str, Any]:
+    tts = _echo_config.get_tts()
+    voice_langs = _voice_language_codes()
+    coach_chain = _echo_config.coach_model_chain()
+    coach_chain_labels = [_coach_model_label(key) for key in coach_chain]
+    fallback_label = coach_chain_labels[1] if len(coach_chain_labels) > 1 else None
     return ok(
         languages=[{"label": label, "value": value} for label, value in _echo_config.language_choices()],
         asr_presets=[{"label": label, "value": value} for label, value in _echo_config.asr_choices()],
+        coach_variants=[
+            {"label": "Tiny Aya Global (70+ languages)", "value": "tiny-aya-global"},
+        ],
         default_language=_echo_config.language_choices()[0][1] if _echo_config.language_choices() else "en",
         default_asr=_echo_config.asr_preset,
+        default_coach=_echo_config.coach_model,
+        coach_fallbacks=list(_echo_config.coach_fallbacks),
+        coach_chain=coach_chain,
+        coach_chain_labels=coach_chain_labels,
+        voice_languages=voice_langs,
         max_seconds=_echo_config.max_seconds,
+        voiceout_note=(
+            f"Voice in/out: {len(voice_langs)} languages via Piper · "
+            f"Coach: {coach_chain_labels[0]}"
+            + (f" (fallback: {fallback_label})" if fallback_label else "")
+        ),
     )
             file_paths,
         )
+    @server.api(name="language_lesson_turn")
+    def _language_lesson_turn(
+        message: str = "",
+        audio_path: str = "",
+        mode: Literal["explain", "lesson"] = "lesson",
+        topic: str = "",
+        session_id: str = "",
+        use_rag: bool = True,
+        history: list | None = None,
+        doc_ids: list[str] | None = None,
+        language: str = "en",
+        asr_preset: str | None = None,
+        auto_voiceout: bool = True,
+        coach_model: str = "",
+        coach_variant: str = "auto",
+    ) -> dict[str, Any]:
+        return api_language_lesson_turn(
+            message,
+            audio_path,
+            mode,
+            topic,
+            session_id,
+            use_rag,
+            history,
+            doc_ids,
+            language,
+            asr_preset,
+            auto_voiceout,
+            coach_model,
+            coach_variant,
+        )
     @server.api(name="teacher_voice_turn")
     def _teacher_voice_turn(
         message: str,
         doc_ids: list[str] | None = None,
         language: str = "en",
         asr_preset: str | None = None,
+        auto_voiceout: bool = True,
+        coach_model: str = "",
+        coach_variant: str = "auto",
     ) -> dict[str, Any]:
         return api_teacher_voice_turn(
             message,
             doc_ids,
             language,
             asr_preset,
+            auto_voiceout,
+            coach_model,
+            coach_variant,
         )
     @server.api(name="teacher_voice_audio_turn")
         doc_ids: list[str] | None = None,
         language: str = "en",
         asr_preset: str | None = None,
+        auto_voiceout: bool = True,
+        coach_model: str = "",
+        coach_variant: str = "auto",
     ) -> dict[str, Any]:
         return api_teacher_voice_audio_turn(
             audio_path,
             doc_ids,
             language,
             asr_preset,
+            auto_voiceout,
+            coach_model,
+            coach_variant,
         )
     @server.api(name="teacher_voice_clear")

apps/gradio-space/static/studio/index.html CHANGED Viewed

@@ -34,7 +34,7 @@
     <nav class="sidebar-nav">
       <button type="button" class="nav-item" data-view="research"><span class="material-symbols-outlined">search</span>Research</button>
       <button type="button" class="nav-item active" data-view="slides"><span class="material-symbols-outlined">present_to_all</span>Slides</button>
-      <button type="button" class="nav-item" data-view="voice"><span class="material-symbols-outlined">mic</span>Voice</button>
       <button type="button" class="nav-item" data-view="debug"><span class="material-symbols-outlined">bug_report</span>Debug</button>
       <button type="button" id="btn-open-settings" class="nav-item"><span class="material-symbols-outlined">settings</span>Settings</button>
       <a href="/classic" class="nav-item nav-link"><span class="material-symbols-outlined">open_in_new</span>Classic UI</a>
@@ -300,125 +300,104 @@
     </section>
     <section class="col col-studio">
-      <div class="voice-layout view-voice-only">
-        <aside class="voice-rail">
-          <div class="card voice-rag-card">
-            <p class="card-title">RAG Scope</p>
             <label class="toggle-row">
               <span>Answer from my indexed sources</span>
-              <input id="use-rag" type="checkbox" checked />
             </label>
-            <p class="status-text">Ground teacher replies in your workspace documents when enabled.</p>
           </div>
-          <div class="card voice-rail-controls">
             <p class="card-title">Mode</p>
-            <div class="mode-cards voice-mode-cards" id="voice-modes">
               <button type="button" class="mode-card" data-mode="explain">Explain</button>
-              <button type="button" class="mode-card active" data-mode="lesson">Lesson</button>
-              <button type="button" class="mode-card" data-mode="pitch">Practice</button>
             </div>
-            <label class="field voice-topic-wrap" id="voice-topic-wrap">
-              <span>Focus topic</span>
-              <input id="voice-topic" type="text" class="input" placeholder="Uses workspace topic when empty" />
             </label>
-            <details class="voice-rag-sources" id="voice-rag-sources">
               <summary>Add sources (optional)</summary>
               <p class="status-text">Discover or ingest sources to ground answers in your library.</p>
               <div class="ingest-action-row">
-                <button type="button" id="btn-voice-discover" class="btn btn-secondary">Discover on web</button>
-                <button type="button" id="btn-voice-auto-ingest" class="btn btn-secondary">Auto-ingest</button>
               </div>
-              <div id="voice-url-choices-panel" class="url-choices-panel hidden">
-                <div id="voice-url-choices-list" class="url-choices-list"></div>
               </div>
               <label class="field">
                 <span>Paste URLs (one per line)</span>
-                <textarea id="voice-urls-text" class="input" rows="2" placeholder="https://…"></textarea>
               </label>
               <label class="upload-zone upload-zone-compact">
-                <input id="voice-ingest-file" type="file" accept=".pdf,.docx" multiple hidden />
                 <span class="material-symbols-outlined">upload_file</span>
                 <span>Upload PDF or Doc</span>
               </label>
-              <button type="button" id="btn-voice-ingest" class="btn btn-secondary btn-block">Ingest sources</button>
-              <p id="voice-ingest-status" class="status-text"></p>
             </details>
           </div>
         </aside>
-        <div class="voice-main">
-          <div class="card voice-main-card">
-            <div class="voice-card-head">
-              <h2 class="section-label">Teacher Voice</h2>
-              <p class="voice-card-desc">Talk with the teacher using text or voice — grounded in your sources when RAG is on.</p>
             </div>
-            <div id="voice-chat-messages" class="research-chat-messages voice-chat-messages">
-              <p class="research-chat-empty">Type a message or record audio, then send.</p>
             </div>
-            <div class="voice-compose" id="voice-panel">
               <label class="field">
-                <span>Ask the teacher</span>
-                <textarea id="voice-message" class="input" rows="2" placeholder="What is the difference between pretraining and finetuning a small model?"></textarea>
               </label>
-              <div class="voice-input-toolbar">
-                <div class="recording-row voice-recording-row">
-                  <button type="button" id="btn-voice-record-start" class="btn btn-secondary">Start mic</button>
-                  <button type="button" id="btn-voice-record-stop" class="btn btn-secondary" disabled>Stop mic</button>
-                  <input id="voice-audio-upload" type="file" accept="audio/*" class="input input-compact" />
-                </div>
-                <p id="voice-record-status" class="status-text voice-record-status"></p>
-              </div>
-              <div class="voice-send-row">
-                <button type="button" id="btn-voice-send" class="btn btn-secondary">Send text</button>
-                <button type="button" id="btn-voice-audio-send" class="btn btn-primary">Send voice turn</button>
-              </div>
-              <p id="voice-turn-status" class="status-text"></p>
-              <div class="voice-replay-row">
-                <button type="button" id="btn-voice-speak-full" class="btn btn-secondary">Speak full reply</button>
-                <button type="button" id="btn-voice-speak-quick" class="btn btn-secondary">Speak first sentence</button>
-                <button type="button" id="btn-voice-clear" class="btn btn-ghost">Clear conversation</button>
-              </div>
-              <div id="voice-audio-out" class="voice-audio-out"></div>
-            </div>
-          </div>
-          <details class="card voice-pitch-analysis hidden" id="voice-pitch-analysis" open>
-            <summary class="voice-pitch-summary">
-              <span class="section-label">Deep pitch analysis</span>
-              <span class="voice-pitch-summary-hint">Pace, fillers, charts, and spoken rewrite</span>
-            </summary>
-            <div class="coach-panel-wrap">
-              <p class="coach-card-desc">Record or upload a short monologue (up to 30s), then analyze for metrics and feedback.</p>
-              <div class="coach-capture-row">
-                <div class="coach-capture-controls">
-                  <div class="recording-row coach-recording-row">
-                    <button type="button" id="btn-coach-record-start" class="btn btn-secondary">Start mic</button>
-                    <button type="button" id="btn-coach-record-stop" class="btn btn-secondary" disabled>Stop mic</button>
-                    <button type="button" id="btn-coach-sample" class="btn btn-ghost">Load sample</button>
-                  </div>
-                  <p id="coach-record-status" class="status-text coach-record-status"></p>
-                </div>
-                <label class="field coach-upload-field">
-                  <span>Upload pitch (WAV)</span>
-                  <input id="coach-audio" type="file" accept="audio/*" />
                 </label>
               </div>
-              <div class="controls-grid coach-presets">
-                <label class="field">
-                  <span>Language</span>
-                  <select id="coach-language" class="input"></select>
-                </label>
-                <label class="field">
-                  <span>ASR preset</span>
-                  <select id="coach-asr" class="input"></select>
                 </label>
               </div>
-              <label class="toggle-row coach-voiceout-toggle">
-                <span>Speak full rewrite (VoiceOut)</span>
-                <input id="coach-speak-rewrite" type="checkbox" />
-              </label>
-              <button type="button" id="btn-analyze" class="btn btn-primary btn-block coach-analyze-btn">Analyze pitch</button>
-              <div id="coach-panel" class="coach-results-panel"></div>
             </div>
-          </details>
         </div>
       </div>
     </section>

     <nav class="sidebar-nav">
       <button type="button" class="nav-item" data-view="research"><span class="material-symbols-outlined">search</span>Research</button>
       <button type="button" class="nav-item active" data-view="slides"><span class="material-symbols-outlined">present_to_all</span>Slides</button>
+      <button type="button" class="nav-item" data-view="language-lessons"><span class="material-symbols-outlined">translate</span>Language lessons</button>
       <button type="button" class="nav-item" data-view="debug"><span class="material-symbols-outlined">bug_report</span>Debug</button>
       <button type="button" id="btn-open-settings" class="nav-item"><span class="material-symbols-outlined">settings</span>Settings</button>
       <a href="/classic" class="nav-item nav-link"><span class="material-symbols-outlined">open_in_new</span>Classic UI</a>
     </section>
     <section class="col col-studio">
+      <div class="lessons-layout view-lessons-only">
+        <aside class="lessons-rail">
+          <div class="card lessons-rail-card">
+            <p class="card-title">Target language</p>
+            <label class="field">
+              <span>Lesson language</span>
+              <select id="lessons-language" class="input"></select>
+            </label>
+            <label class="field lessons-other-lang hidden" id="lessons-other-lang-wrap">
+              <span>Text-only language code</span>
+              <input id="lessons-other-lang" type="text" class="input" placeholder="e.g. hi, sw" maxlength="8" />
+            </label>
+            <p id="lessons-voiceout-note" class="status-text"></p>
+            <p class="status-text lessons-coach-model">Coach: Tiny Aya Global (70+ languages)</p>
+            <input type="hidden" id="lessons-coach-variant" value="tiny-aya-global" />
+          </div>
+          <div class="card lessons-rag-card">
+            <p class="card-title">RAG scope</p>
             <label class="toggle-row">
               <span>Answer from my indexed sources</span>
+              <input id="lessons-use-rag" type="checkbox" checked />
             </label>
+            <p class="status-text">Ground lesson replies in your workspace documents when enabled.</p>
           </div>
+          <div class="card lessons-rail-controls">
             <p class="card-title">Mode</p>
+            <div class="mode-cards lessons-mode-cards" id="lessons-modes">
               <button type="button" class="mode-card" data-mode="explain">Explain</button>
+              <button type="button" class="mode-card active" data-mode="lesson">Lesson coach</button>
             </div>
+            <label class="field lessons-topic-wrap">
+              <span>Lesson topic</span>
+              <input id="lessons-topic" type="text" class="input" placeholder="Uses workspace topic when empty" />
             </label>
+            <details class="lessons-rag-sources" id="lessons-rag-sources">
               <summary>Add sources (optional)</summary>
               <p class="status-text">Discover or ingest sources to ground answers in your library.</p>
               <div class="ingest-action-row">
+                <button type="button" id="btn-lessons-discover" class="btn btn-secondary">Discover on web</button>
+                <button type="button" id="btn-lessons-auto-ingest" class="btn btn-secondary">Auto-ingest</button>
               </div>
+              <div id="lessons-url-choices-panel" class="url-choices-panel hidden">
+                <div id="lessons-url-choices-list" class="url-choices-list"></div>
               </div>
               <label class="field">
                 <span>Paste URLs (one per line)</span>
+                <textarea id="lessons-urls-text" class="input" rows="2" placeholder="https://…"></textarea>
               </label>
               <label class="upload-zone upload-zone-compact">
+                <input id="lessons-ingest-file" type="file" accept=".pdf,.docx" multiple hidden />
                 <span class="material-symbols-outlined">upload_file</span>
                 <span>Upload PDF or Doc</span>
               </label>
+              <button type="button" id="btn-lessons-ingest" class="btn btn-secondary btn-block">Ingest sources</button>
+              <p id="lessons-ingest-status" class="status-text"></p>
             </details>
           </div>
         </aside>
+        <div class="lessons-main">
+          <div class="card lessons-main-card">
+            <div class="lessons-card-head">
+              <h2 class="section-label">Language lessons</h2>
+              <p class="lessons-card-desc">Learn in your language — type, hold the mic, or upload audio. Replies can speak back automatically.</p>
             </div>
+            <div id="lessons-chat-messages" class="research-chat-messages lessons-chat-messages">
+              <p class="research-chat-empty">Choose a language, then type, speak, or upload audio to start your lesson.</p>
             </div>
+            <div class="lessons-compose" id="lessons-panel">
               <label class="field">
+                <span>Your message</span>
+                <textarea id="lessons-message" class="input" rows="2" placeholder="What is the difference between pretraining and finetuning a small model?"></textarea>
               </label>
+              <div class="lessons-input-toolbar">
+                <button type="button" id="btn-lessons-hold-mic" class="btn btn-secondary lessons-hold-mic">Hold to speak</button>
+                <button type="button" id="btn-lessons-record-start" class="btn btn-secondary btn-compact">Start mic</button>
+                <button type="button" id="btn-lessons-record-stop" class="btn btn-secondary btn-compact" disabled>Stop mic</button>
+                <label class="lessons-upload-btn btn btn-secondary">
+                  <span class="material-symbols-outlined">upload_file</span>
+                  Upload audio
+                  <input id="lessons-audio-upload" type="file" accept="audio/*" hidden />
                 </label>
               </div>
+              <p id="lessons-record-status" class="status-text lessons-record-status"></p>
+              <div class="lessons-send-row">
+                <button type="button" id="btn-lessons-send" class="btn btn-primary">Send</button>
+                <label class="toggle-row lessons-auto-speak">
+                  <span>Auto-speak replies</span>
+                  <input id="lessons-auto-speak" type="checkbox" checked />
                 </label>
+                <button type="button" id="btn-lessons-clear" class="btn btn-ghost">Clear</button>
               </div>
+              <p id="lessons-turn-status" class="status-text"></p>
             </div>
+          </div>
+          <p class="lessons-classic-link status-text">
+            Pitch metrics and monologue analysis live in
+            <a href="/classic">Classic UI → EchoCoach</a>.
+          </p>
         </div>
       </div>
     </section>

apps/gradio-space/static/studio/studio.css CHANGED Viewed

@@ -387,11 +387,11 @@ body {
 .region-loading-host,
 .card-ingest,
 .card-chat,
-.voice-main-card,
 .coach-panel-wrap,
 .coach-debug-card,
 .controls-panel,
-.voice-rail-controls {
   position: relative;
 }
@@ -1017,26 +1017,26 @@ body {
   .research-layout { grid-template-columns: 1fr; }
 }
-.workspace[data-view="voice"] .col-research,
-.workspace[data-view="voice"] .col-slides { display: none; }
-.workspace[data-view="voice"] .col-debug { display: none; }
-.view-voice-only { display: none; }
-.workspace[data-view="voice"] {
   grid-template-columns: minmax(0, 1fr);
   max-width: 1280px;
   gap: 1.25rem;
 }
-.workspace[data-view="voice"] .col-studio {
   grid-column: 1 / -1;
   width: 100%;
   min-width: 0;
 }
-.workspace[data-view="voice"] .voice-layout {
   display: grid;
   grid-template-columns: minmax(260px, 0.78fr) minmax(0, 1.22fr);
   gap: 1.25rem;
@@ -1044,29 +1044,29 @@ body {
   width: 100%;
 }
-.workspace[data-view="voice"] .voice-rail {
   display: flex;
   flex-direction: column;
   gap: 1rem;
   min-width: 0;
 }
-.workspace[data-view="voice"] .voice-main {
   min-width: 0;
   display: flex;
   flex-direction: column;
   gap: 1rem;
 }
-.workspace[data-view="voice"] .voice-pitch-analysis {
   margin: 0;
 }
-.workspace[data-view="voice"] .voice-pitch-analysis[open] .voice-pitch-summary {
   margin-bottom: 0.75rem;
 }
-.workspace[data-view="voice"] .voice-pitch-summary {
   cursor: pointer;
   list-style: none;
   display: flex;
@@ -1074,63 +1074,63 @@ body {
   gap: 0.2rem;
 }
-.workspace[data-view="voice"] .voice-pitch-summary::-webkit-details-marker {
   display: none;
 }
-.workspace[data-view="voice"] .voice-pitch-summary-hint {
   font-size: 0.84rem;
   color: var(--secondary);
   font-weight: 400;
 }
-.workspace[data-view="voice"] .voice-pitch-analysis .coach-panel-wrap {
   padding-top: 0.25rem;
 }
-.workspace[data-view="voice"] .voice-discuss-btn {
   margin-top: 0.75rem;
 }
-.workspace[data-view="voice"] .coach-results-panel {
   min-height: 80px;
   margin-top: 0.75rem;
   overflow-y: auto;
 }
-.workspace[data-view="voice"] .coach-results-panel:not(:empty) {
   border-top: 1px solid var(--outline-variant);
   padding-top: 0.75rem;
 }
-.workspace[data-view="voice"] .voice-main-card {
   display: flex;
   flex-direction: column;
 }
-.workspace[data-view="voice"] .voice-compose {
   display: flex;
   flex-direction: column;
   gap: 0.5rem;
 }
-.workspace[data-view="voice"] .voice-compose .field {
   margin: 0;
 }
-.workspace[data-view="voice"] .voice-compose textarea {
   min-height: 3.25rem;
   resize: vertical;
 }
-.workspace[data-view="voice"] .voice-rail .voice-mode-cards {
   flex-direction: row;
   flex-wrap: wrap;
   gap: 0.35rem;
   margin-bottom: 0.75rem;
 }
-.workspace[data-view="voice"] .voice-rail .voice-mode-cards .mode-card {
   flex: 1 1 calc(33.333% - 0.35rem);
   text-align: center;
   justify-content: center;
@@ -1139,27 +1139,27 @@ body {
   padding-right: 0.5rem;
 }
-.workspace[data-view="voice"] .voice-rail-controls .voice-topic-wrap {
   margin: 0 0 0.75rem;
 }
-.workspace[data-view="voice"] .voice-rag-sources {
   margin: 0;
 }
-.workspace[data-view="voice"] .voice-rag-sources summary {
   cursor: pointer;
   font-weight: 600;
   font-size: 0.82rem;
 }
-.workspace[data-view="voice"] .voice-chat-messages {
   min-height: 160px;
   max-height: min(260px, 32vh);
   margin: 0 0 0.75rem;
 }
-.workspace[data-view="voice"] .voice-input-toolbar {
   padding: 0.65rem 0.75rem;
   border: 1px solid var(--outline-variant);
   border-radius: var(--radius-lg);
@@ -1167,31 +1167,31 @@ body {
   margin-bottom: 0.65rem;
 }
-.workspace[data-view="voice"] .voice-recording-row {
   margin: 0;
 }
-.workspace[data-view="voice"] .voice-record-status {
   margin: 0.35rem 0 0;
   min-height: 1.1rem;
 }
-.workspace[data-view="voice"] .voice-send-row {
   display: grid;
   grid-template-columns: 1fr 1fr;
   gap: 0.5rem;
   margin-bottom: 0.35rem;
 }
-.workspace[data-view="voice"] .voice-card-head {
   margin-bottom: 0.85rem;
 }
-.workspace[data-view="voice"] .voice-card-head .section-label {
   margin-bottom: 0.35rem;
 }
-.voice-card-desc {
   margin: 0;
   font-size: 0.84rem;
   line-height: 1.45;
@@ -1199,24 +1199,24 @@ body {
 }
 @media (max-width: 960px) {
-  .workspace[data-view="voice"] .voice-layout {
     grid-template-columns: 1fr;
     max-width: 640px;
     margin-left: auto;
     margin-right: auto;
   }
-  .workspace[data-view="voice"] .voice-rail .voice-mode-cards {
     flex-direction: column;
   }
-  .workspace[data-view="voice"] .voice-rail .voice-mode-cards .mode-card {
     flex: 1 1 auto;
     text-align: left;
     justify-content: space-between;
   }
-  .workspace[data-view="voice"] .voice-send-row {
     grid-template-columns: 1fr;
   }
 }
@@ -1421,22 +1421,22 @@ body {
   max-width: 160px;
 }
-.voice-audio-out audio,
 .studio-coach-voiceout audio {
   width: 100%;
   margin-top: 0.5rem;
 }
-.voice-chat-messages {
   max-height: 220px;
   margin: 0.75rem 0;
 }
-.voice-rag-sources {
   margin: 0.75rem 0;
 }
-.voice-rag-sources summary {
   cursor: pointer;
   font-weight: 600;
   font-size: 0.875rem;
@@ -1450,14 +1450,14 @@ body {
   color: var(--on-surface-variant);
 }
-.voice-replay-row {
   display: flex;
   flex-wrap: wrap;
   gap: 0.5rem;
   margin-top: 0.5rem;
 }
-.voice-replay-row .btn-ghost {
   margin-left: auto;
 }
@@ -1578,3 +1578,72 @@ body {
   max-height: 320px;
 }

 .region-loading-host,
 .card-ingest,
 .card-chat,
+.lessons-main-card,
 .coach-panel-wrap,
 .coach-debug-card,
 .controls-panel,
+.lessons-rail-controls {
   position: relative;
 }
   .research-layout { grid-template-columns: 1fr; }
 }
+.workspace[data-view="language-lessons"] .col-research,
+.workspace[data-view="language-lessons"] .col-slides { display: none; }
+.workspace[data-view="language-lessons"] .col-debug { display: none; }
+.view-lessons-only { display: none; }
+.workspace[data-view="language-lessons"] {
   grid-template-columns: minmax(0, 1fr);
   max-width: 1280px;
   gap: 1.25rem;
 }
+.workspace[data-view="language-lessons"] .col-studio {
   grid-column: 1 / -1;
   width: 100%;
   min-width: 0;
 }
+.workspace[data-view="language-lessons"] .lessons-layout {
   display: grid;
   grid-template-columns: minmax(260px, 0.78fr) minmax(0, 1.22fr);
   gap: 1.25rem;
   width: 100%;
 }
+.workspace[data-view="language-lessons"] .lessons-rail {
   display: flex;
   flex-direction: column;
   gap: 1rem;
   min-width: 0;
 }
+.workspace[data-view="language-lessons"] .lessons-main {
   min-width: 0;
   display: flex;
   flex-direction: column;
   gap: 1rem;
 }
+.workspace[data-view="language-lessons"] .lessons-pitch-analysis {
   margin: 0;
 }
+.workspace[data-view="language-lessons"] .lessons-pitch-analysis[open] .lessons-pitch-summary {
   margin-bottom: 0.75rem;
 }
+.workspace[data-view="language-lessons"] .lessons-pitch-summary {
   cursor: pointer;
   list-style: none;
   display: flex;
   gap: 0.2rem;
 }
+.workspace[data-view="language-lessons"] .lessons-pitch-summary::-webkit-details-marker {
   display: none;
 }
+.workspace[data-view="language-lessons"] .lessons-pitch-summary-hint {
   font-size: 0.84rem;
   color: var(--secondary);
   font-weight: 400;
 }
+.workspace[data-view="language-lessons"] .lessons-pitch-analysis .coach-panel-wrap {
   padding-top: 0.25rem;
 }
+.workspace[data-view="language-lessons"] .lessons-discuss-btn {
   margin-top: 0.75rem;
 }
+.workspace[data-view="language-lessons"] .coach-results-panel {
   min-height: 80px;
   margin-top: 0.75rem;
   overflow-y: auto;
 }
+.workspace[data-view="language-lessons"] .coach-results-panel:not(:empty) {
   border-top: 1px solid var(--outline-variant);
   padding-top: 0.75rem;
 }
+.workspace[data-view="language-lessons"] .lessons-main-card {
   display: flex;
   flex-direction: column;
 }
+.workspace[data-view="language-lessons"] .lessons-compose {
   display: flex;
   flex-direction: column;
   gap: 0.5rem;
 }
+.workspace[data-view="language-lessons"] .lessons-compose .field {
   margin: 0;
 }
+.workspace[data-view="language-lessons"] .lessons-compose textarea {
   min-height: 3.25rem;
   resize: vertical;
 }
+.workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards {
   flex-direction: row;
   flex-wrap: wrap;
   gap: 0.35rem;
   margin-bottom: 0.75rem;
 }
+.workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards .mode-card {
   flex: 1 1 calc(33.333% - 0.35rem);
   text-align: center;
   justify-content: center;
   padding-right: 0.5rem;
 }
+.workspace[data-view="language-lessons"] .lessons-rail-controls .lessons-topic-wrap {
   margin: 0 0 0.75rem;
 }
+.workspace[data-view="language-lessons"] .lessons-rag-sources {
   margin: 0;
 }
+.workspace[data-view="language-lessons"] .lessons-rag-sources summary {
   cursor: pointer;
   font-weight: 600;
   font-size: 0.82rem;
 }
+.workspace[data-view="language-lessons"] .lessons-chat-messages {
   min-height: 160px;
   max-height: min(260px, 32vh);
   margin: 0 0 0.75rem;
 }
+.workspace[data-view="language-lessons"] .lessons-input-toolbar {
   padding: 0.65rem 0.75rem;
   border: 1px solid var(--outline-variant);
   border-radius: var(--radius-lg);
   margin-bottom: 0.65rem;
 }
+.workspace[data-view="language-lessons"] .lessons-recording-row {
   margin: 0;
 }
+.workspace[data-view="language-lessons"] .lessons-record-status {
   margin: 0.35rem 0 0;
   min-height: 1.1rem;
 }
+.workspace[data-view="language-lessons"] .lessons-send-row {
   display: grid;
   grid-template-columns: 1fr 1fr;
   gap: 0.5rem;
   margin-bottom: 0.35rem;
 }
+.workspace[data-view="language-lessons"] .lessons-card-head {
   margin-bottom: 0.85rem;
 }
+.workspace[data-view="language-lessons"] .lessons-card-head .section-label {
   margin-bottom: 0.35rem;
 }
+.lessons-card-desc {
   margin: 0;
   font-size: 0.84rem;
   line-height: 1.45;
 }
 @media (max-width: 960px) {
+  .workspace[data-view="language-lessons"] .lessons-layout {
     grid-template-columns: 1fr;
     max-width: 640px;
     margin-left: auto;
     margin-right: auto;
   }
+  .workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards {
     flex-direction: column;
   }
+  .workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards .mode-card {
     flex: 1 1 auto;
     text-align: left;
     justify-content: space-between;
   }
+  .workspace[data-view="language-lessons"] .lessons-send-row {
     grid-template-columns: 1fr;
   }
 }
   max-width: 160px;
 }
+.lessons-audio-out audio,
 .studio-coach-voiceout audio {
   width: 100%;
   margin-top: 0.5rem;
 }
+.lessons-chat-messages {
   max-height: 220px;
   margin: 0.75rem 0;
 }
+.lessons-rag-sources {
   margin: 0.75rem 0;
 }
+.lessons-rag-sources summary {
   cursor: pointer;
   font-weight: 600;
   font-size: 0.875rem;
   color: var(--on-surface-variant);
 }
+.lessons-replay-row {
   display: flex;
   flex-wrap: wrap;
   gap: 0.5rem;
   margin-top: 0.5rem;
 }
+.lessons-replay-row .btn-ghost {
   margin-left: auto;
 }
   max-height: 320px;
 }
+.lessons-rail-card .field + .field {
+  margin-top: 0.65rem;
+}
+.lessons-input-toolbar {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 0.5rem;
+  align-items: center;
+  margin-top: 0.5rem;
+}
+.lessons-hold-mic.is-recording {
+  background: var(--primary-container);
+  color: var(--on-primary-container);
+}
+.lessons-upload-btn {
+  cursor: pointer;
+  display: inline-flex;
+  align-items: center;
+  gap: 0.35rem;
+}
+.lessons-auto-speak {
+  margin: 0;
+  flex: 1 1 auto;
+  justify-content: flex-end;
+}
+.lessons-send-row {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 0.65rem;
+  align-items: center;
+  margin-top: 0.65rem;
+}
+.lessons-chat-messages .chat-audio-inline {
+  margin-top: 0.5rem;
+  width: 100%;
+}
+.lessons-classic-link {
+  margin-top: 0.75rem;
+  text-align: center;
+}
+.lessons-classic-link a {
+  color: var(--primary);
+}
+.lessons-message-user::before {
+  content: "You · ";
+  font-weight: 600;
+  opacity: 0.75;
+}
+.lessons-message-assistant::before {
+  content: "Teacher · ";
+  font-weight: 600;
+  opacity: 0.75;
+}
+.btn-compact {
+  padding-inline: 0.65rem;
+  font-size: 0.82rem;
+}

apps/gradio-space/static/studio/studio.js CHANGED Viewed

@@ -40,11 +40,11 @@ const state = {
   selectedUrls: [],
   slideDiscoveredUrls: [],
   slideSelectedUrls: [],
-  voiceDiscoveredUrls: [],
-  voiceSelectedUrls: [],
   researchChatHistory: [],
   debugChatHistory: [],
-  voiceMode: "lesson",
   history: [],
   downloads: null,
   client: null,
@@ -55,9 +55,8 @@ const state = {
   recordingTarget: null,
   browserRecorder: null,
   browserRecordChunks: [],
-  pendingVoiceAudioPath: null,
-  pendingCoachAudioPath: null,
-  lastPitchAnalysis: null,
   useBrowserMic: true,
 };
@@ -223,16 +222,37 @@ function renderResearchUrlChoices(urls, selected) {
   if (getIngestWorkflow() === "select") panel?.classList.remove("hidden");
 }
-function voiceEffectiveTopic() {
-  if (state.voiceMode === "pitch") return effectiveTopic("");
-  return effectiveTopic($("#voice-topic")?.value || "");
 }
-function voiceUseRag() {
-  return $("#use-rag").checked && state.voiceMode !== "pitch";
 }
-function voiceMessageText(content) {
   if (content == null) return "";
   if (typeof content === "string") return content;
   if (Array.isArray(content)) {
@@ -253,8 +273,14 @@ function ingestSucceeded(status) {
   );
 }
-function applyVoiceIngestResult(data) {
-  $("#voice-ingest-status").textContent = stripMd(data.status || "Ingest complete.");
   state.workspaceSessionId = data.session_id || state.workspaceSessionId;
   $("#workspace-session").value = state.workspaceSessionId;
   if (data.documents_html) {
@@ -264,20 +290,21 @@ function applyVoiceIngestResult(data) {
   updateResearchRagBadge();
   updateResearchDocCount((data.documents || []).length);
   if (ingestSucceeded(data.status)) {
-    $("#use-rag").checked = true;
   }
 }
-async function discoverVoiceSources() {
-  const topic = voiceEffectiveTopic();
   if (!topic) {
-    showError("Set a focus or workspace topic before discovering sources.");
     return;
   }
-  await withRegionLoading($(".voice-rail-controls"), "Discovering sources…", async () => {
     const data = await callApi("discover_sources", [topic, state.workspaceSessionId]);
-    $("#voice-ingest-status").textContent = stripMd(data.status || "Discovery complete.");
-    renderVoiceUrlChoices(data.urls || [], data.selected_urls || data.urls || []);
     if (data.session_id) {
       state.workspaceSessionId = data.session_id;
       $("#workspace-session").value = data.session_id;
@@ -286,32 +313,32 @@ async function discoverVoiceSources() {
   });
 }
-async function autoVoiceIngest() {
-  const topic = voiceEffectiveTopic();
   if (!topic) {
-    showError("Set a focus or workspace topic before auto-ingest.");
     return;
   }
-  await withRegionLoading($(".voice-rail-controls"), "Auto-ingesting sources…", async () => {
     const data = await callApi("auto_search_ingest", [topic, state.workspaceSessionId]);
-    applyVoiceIngestResult(data);
-    state.voiceDiscoveredUrls = [];
-    state.voiceSelectedUrls = [];
-    renderVoiceUrlChoices([], []);
     await refreshWorkspaceSessions(state.workspaceSessionId);
   });
 }
-async function ingestVoiceSources() {
-  const topic = voiceEffectiveTopic();
-  const pasted = $("#voice-urls-text")?.value.trim() || "";
-  const selected = getSelectedDiscoveredUrls("#voice-url-choices-list");
-  const files = $("#voice-ingest-file")?.files;
   if (!pasted && !selected.length && !files?.length) {
     showError("Add URLs, select suggested sources, or upload a file — then ingest.");
     return;
   }
-  await withRegionLoading($(".voice-rail-controls"), "Ingesting sources…", async () => {
     const paths = [];
     if (files?.length) {
       for (const file of files) {
@@ -325,35 +352,40 @@ async function ingestVoiceSources() {
       selected,
       paths,
     ]);
-    applyVoiceIngestResult(data);
-    if (pasted) $("#voice-urls-text").value = "";
-    if (files?.length) $("#voice-ingest-file").value = "";
     await refreshWorkspaceSessions(state.workspaceSessionId);
   });
 }
-function syncVoiceModeUi() {
-  const ragMode = state.voiceMode === "explain" || state.voiceMode === "lesson";
-  const practiceMode = state.voiceMode === "pitch";
-  $("#voice-topic-wrap")?.classList.toggle("hidden", !ragMode);
-  $("#voice-rag-sources")?.classList.toggle("hidden", !ragMode);
-  $(".voice-rag-card")?.classList.toggle("hidden", practiceMode);
-  $("#voice-pitch-analysis")?.classList.toggle("hidden", !practiceMode);
   const placeholders = {
     explain: "e.g. How does finetuning differ from pretraining?",
     lesson: "What is the difference between pretraining and finetuning a small model?",
-    pitch: "e.g. Here is my opening line — how can I improve it?",
   };
-  const messageEl = $("#voice-message");
-  if (messageEl) messageEl.placeholder = placeholders[state.voiceMode] || placeholders.lesson;
 }
-function renderVoiceChat() {
-  const container = $("#voice-chat-messages");
   if (!container) return;
   if (!state.history.length) {
     container.innerHTML =
-      '<p class="research-chat-empty">Type a message or record audio, then send.</p>';
     return;
   }
   const parts = [];
@@ -361,9 +393,13 @@ function renderVoiceChat() {
     if (item && typeof item === "object" && item.role) {
       const role = item.role === "user" ? "user" : "assistant";
       const label = role === "user" ? "You" : "Teacher";
-      let body = renderMarkdownLite(voiceMessageText(item.content));
       if (role === "assistant" && item.rag_references) {
-        body += `<div class="voice-rag-refs">${renderMarkdownLite(item.rag_references)}</div>`;
       }
       parts.push(
         `<div class="research-chat-bubble research-chat-${role}"><div class="research-chat-role">${label}</div><div class="research-chat-body">${body}</div></div>`
@@ -380,18 +416,50 @@ function renderVoiceChat() {
   container.scrollTop = container.scrollHeight;
 }
-function renderVoiceUrlChoices(urls, selected) {
-  state.voiceDiscoveredUrls = urls || [];
-  state.voiceSelectedUrls = selected?.length ? selected : [...state.voiceDiscoveredUrls];
   renderUrlChoices(
     urls,
     selected,
-    "#voice-url-choices-list",
-    "#voice-url-choices-panel",
-    { discovered: state.voiceDiscoveredUrls, selected: state.voiceSelectedUrls }
   );
 }
 function renderSlideUrlChoices(urls, selected) {
   state.slideDiscoveredUrls = urls || [];
   state.slideSelectedUrls = selected?.length ? selected : [...state.slideDiscoveredUrls];
@@ -961,23 +1029,30 @@ async function refreshDocuments() {
   }
 }
-async function initVoicePresets() {
   const data = await callApi("voice_presets", []);
   state.voicePresets = data;
-  const langSelect = $("#coach-language");
-  const asrSelect = $("#coach-asr");
   if (langSelect) {
-    langSelect.innerHTML = (data.languages || [])
       .map((o) => `<option value="${o.value}">${o.label}</option>`)
       .join("");
     langSelect.value = data.default_language || "en";
   }
-  if (asrSelect) {
-    asrSelect.innerHTML = (data.asr_presets || [])
-      .map((o) => `<option value="${o.value}">${o.label}</option>`)
-      .join("");
-    asrSelect.value = data.default_asr || "";
   }
 }
 async function initSettings() {
@@ -1033,10 +1108,10 @@ async function initWorkspace() {
   updateResearchRagBadge();
   await refreshWorkspaceSessions();
   await refreshDocuments();
-  await initVoicePresets();
   await initSettings();
-  syncVoiceModeUi();
-  renderVoiceChat();
   await refreshDebugDocuments();
   const recStatus = await callApi("recording_status", []);
   state.useBrowserMic = !recStatus.backend || /unavailable|no capture/i.test(recStatus.message || "");
@@ -1055,7 +1130,7 @@ async function generateSlides() {
   const topic = effectiveTopic($("#lesson-topic").value);
   const grade = $("#lesson-grade").value;
   const slideCount = Number($("#slide-count").value);
-  const useRag = $("#use-rag").checked;
   const docIds = effectiveDocIds([]);
   const sourceMode = $("#slide-source-mode")?.value || "";
   const searchWorkflow = $("#slide-search-workflow")?.value || "two_step";
@@ -1155,63 +1230,41 @@ async function generateSlides() {
   );
 }
-function renderVoiceReply(data, { keepAudio = false } = {}) {
   state.history = data.history ?? state.history;
-  if (data.rag_references && state.history.length) {
     const last = state.history[state.history.length - 1];
     if (last && typeof last === "object" && last.role === "assistant") {
-      last.rag_references = data.rag_references;
     }
   }
-  renderVoiceChat();
   if (data.status) {
-    $("#voice-turn-status").textContent = stripMd(data.status);
-  }
-  const out = $("#voice-audio-out");
-  if (data.voiceout_path) {
-    out.innerHTML = `<audio controls src="${fileUrl(data.voiceout_path)}"></audio>`;
-  } else if (!keepAudio) {
-    out.innerHTML = "";
   }
 }
-async function sendVoiceTurn() {
-  const message = $("#voice-message").value.trim();
-  if (!message) {
-    showError("Enter a message first.");
-    return;
-  }
-  const topic = voiceEffectiveTopic();
-  const useRag = voiceUseRag();
-  const docIds = effectiveDocIds([]);
-  const language = state.voicePresets?.default_language || "en";
-  await withRegionLoading($(".voice-main-card"), "Teacher is thinking…", async () => {
-    const data = await callApi("teacher_voice_turn", [
-      message,
-      state.voiceMode,
-      topic,
-      state.workspaceSessionId,
-      useRag,
-      state.history,
-      docIds,
-      language,
-      null,
-    ]);
-    $("#voice-message").value = "";
-    renderVoiceReply(data);
-  });
 }
-async function sendVoiceAudioTurn(audioPath) {
-  const topic = voiceEffectiveTopic();
-  const useRag = voiceUseRag();
   const docIds = effectiveDocIds([]);
-  const language = state.voicePresets?.default_language || "en";
   const asr = state.voicePresets?.default_asr || null;
-  await withRegionLoading($(".voice-main-card"), "Processing voice…", async () => {
-    const data = await callApi("teacher_voice_audio_turn", [
-      audioPath,
-      state.voiceMode,
       topic,
       state.workspaceSessionId,
       useRag,
@@ -1219,81 +1272,100 @@ async function sendVoiceAudioTurn(audioPath) {
       docIds,
       language,
       asr,
     ]);
-    if (data.user_text) $("#voice-message").value = data.user_text;
-    renderVoiceReply(data);
   });
 }
-async function speakVoiceReply(firstSentenceOnly) {
-  const language = state.voicePresets?.default_language || "en";
-  const data = await callApi("teacher_voice_speak", [state.history, language, firstSentenceOnly]);
-  $("#voice-turn-status").textContent = stripMd(data.status || "VoiceOut ready.");
-  if (data.voiceout_path) {
-    $("#voice-audio-out").innerHTML = `<audio controls src="${fileUrl(data.voiceout_path)}"></audio>`;
   }
 }
-async function clearVoiceConversation() {
   const data = await callApi("teacher_voice_clear", []);
   state.history = [];
-  renderVoiceChat();
-  $("#voice-message").value = "";
-  $("#voice-turn-status").textContent = stripMd(data.status || "Conversation cleared.");
-  $("#voice-audio-out").innerHTML = "";
-}
-async function loadSamplePitch() {
-  const data = await callApi("load_sample_pitch", []);
-  state.pendingCoachAudioPath = data.audio_path;
-  $("#coach-record-status").textContent = stripMd(data.status || "Sample clip loaded.");
-}
-async function analyzePitchWithPath(audioPath) {
-  const language = $("#coach-language")?.value || "en";
-  const asr = $("#coach-asr")?.value || null;
-  const speakRewrite = $("#coach-speak-rewrite")?.checked || false;
-  await withRegionLoading($("#voice-pitch-analysis"), "Analyzing pitch…", async () => {
-    const data = await callApi("analyze_pitch", [audioPath, language, asr, speakRewrite]);
-    state.lastPitchAnalysis = data;
-    const panel = $("#coach-panel");
-    panel.innerHTML = data.coach_panel_html || "";
-    const discussBtn = document.createElement("button");
-    discussBtn.type = "button";
-    discussBtn.className = "btn btn-secondary voice-discuss-btn";
-    discussBtn.textContent = "Discuss in chat";
-    discussBtn.addEventListener("click", () => discussPitchInChat().catch(() => {}));
-    if (data.transcript_html || data.report_md || data.tip) {
-      panel.appendChild(discussBtn);
-    }
-  });
 }
-function discussPitchInChat() {
-  const data = state.lastPitchAnalysis;
-  if (!data) return;
-  const parts = [];
-  if (data.tip) parts.push(`Coach tip: ${stripMd(data.tip)}`);
-  if (data.report_md) parts.push(stripMd(data.report_md).slice(0, 800));
-  const prompt =
-    parts.length > 0
-      ? `Here is my pitch analysis. Help me improve based on this feedback:\n\n${parts.join("\n\n")}`
-      : "I just ran pitch analysis — what should I work on next?";
-  $("#voice-message").value = prompt;
-  $("#voice-message").focus();
-  const chat = $("#voice-chat-messages");
-  if (chat) chat.scrollIntoView({ behavior: "smooth", block: "nearest" });
-}
-async function analyzePitch() {
-  let path = state.pendingCoachAudioPath;
-  const file = $("#coach-audio").files?.[0];
   if (file) path = await uploadFile(file);
   if (!path) {
-    showError("Record or upload audio to analyze.");
     return;
   }
-  await analyzePitchWithPath(path);
 }
 async function startBrowserRecording(statusEl) {
@@ -1367,23 +1439,11 @@ async function stopRecording(statusEl, startBtn, stopBtn) {
     path = data.path;
     if (statusEl) statusEl.textContent = stripMd(data.status || "Recording saved.");
   }
-  if (state.recordingTarget === "voice") state.pendingVoiceAudioPath = path;
-  if (state.recordingTarget === "coach") state.pendingCoachAudioPath = path;
   state.recordingTarget = null;
   return path;
 }
-async function sendVoiceFromRecording() {
-  let path = state.pendingVoiceAudioPath;
-  const file = $("#voice-audio-upload").files?.[0];
-  if (file) path = await uploadFile(file);
-  if (!path) {
-    showError("Record or upload audio first.");
-    return;
-  }
-  await sendVoiceAudioTurn(path);
-}
 function bindUi() {
   $("#slide-count").addEventListener("input", (e) => {
     $("#slide-count-val").textContent = e.target.value;
@@ -1450,18 +1510,52 @@ function bindUi() {
   });
   $("#btn-generate").addEventListener("click", () => generateSlides().catch(() => {}));
-  $("#btn-voice-send").addEventListener("click", () => sendVoiceTurn().catch(() => {}));
-  $("#btn-voice-audio-send").addEventListener("click", () => sendVoiceFromRecording().catch(() => {}));
-  $("#btn-voice-discover")?.addEventListener("click", () => discoverVoiceSources().catch(() => {}));
-  $("#btn-voice-auto-ingest")?.addEventListener("click", () => autoVoiceIngest().catch(() => {}));
-  $("#btn-voice-ingest")?.addEventListener("click", () => ingestVoiceSources().catch(() => {}));
-  $("#voice-ingest-file")?.addEventListener("change", (e) => ingestVoiceSources().catch(() => {}));
-  $("#btn-voice-speak-full")?.addEventListener("click", () => speakVoiceReply(false).catch(() => {}));
-  $("#btn-voice-speak-quick")?.addEventListener("click", () => speakVoiceReply(true).catch(() => {}));
-  $("#btn-voice-clear")?.addEventListener("click", () => clearVoiceConversation().catch(() => {}));
-  $("#btn-coach-sample")?.addEventListener("click", () => loadSamplePitch().catch(() => {}));
-  $("#btn-analyze").addEventListener("click", () => analyzePitch().catch(() => {}));
   $("#btn-debug-send").addEventListener("click", () => sendDebugMessage().catch(() => {}));
   $("#debug-session")?.addEventListener("change", () => refreshDebugDocuments().catch(() => {}));
   $("#debug-refresh-sessions")?.addEventListener("click", () => {
     refreshDebugSessions().catch(() => {});
@@ -1475,19 +1569,6 @@ function bindUi() {
     }
   });
-  $("#btn-voice-record-start")?.addEventListener("click", () =>
-    startRecording("voice", $("#voice-record-status"), $("#btn-voice-record-start"), $("#btn-voice-record-stop")).catch(() => {})
-  );
-  $("#btn-voice-record-stop")?.addEventListener("click", () =>
-    stopRecording($("#voice-record-status"), $("#btn-voice-record-start"), $("#btn-voice-record-stop")).catch(() => {})
-  );
-  $("#btn-coach-record-start")?.addEventListener("click", () =>
-    startRecording("coach", $("#coach-record-status"), $("#btn-coach-record-start"), $("#btn-coach-record-stop")).catch(() => {})
-  );
-  $("#btn-coach-record-stop")?.addEventListener("click", () =>
-    stopRecording($("#coach-record-status"), $("#btn-coach-record-start"), $("#btn-coach-record-stop")).catch(() => {})
-  );
   $("#btn-export").addEventListener("click", () => {
     const p = state.downloads?.pptx;
     if (p) window.open(fileUrl(p), "_blank");
@@ -1506,16 +1587,16 @@ function bindUi() {
     refreshDocuments().catch(() => {});
   });
-  document.querySelectorAll(".mode-card").forEach((btn) => {
     btn.addEventListener("click", () => {
-      document.querySelectorAll(".mode-card").forEach((b) => b.classList.remove("active"));
       btn.classList.add("active");
-      state.voiceMode = btn.dataset.mode;
-      syncVoiceModeUi();
     });
   });
-  syncVoiceModeUi();
 }
 bindUi();

   selectedUrls: [],
   slideDiscoveredUrls: [],
   slideSelectedUrls: [],
+  lessonsDiscoveredUrls: [],
+  lessonsSelectedUrls: [],
   researchChatHistory: [],
   debugChatHistory: [],
+  lessonsMode: "lesson",
   history: [],
   downloads: null,
   client: null,
   recordingTarget: null,
   browserRecorder: null,
   browserRecordChunks: [],
+  pendingLessonsAudioPath: null,
+  holdMicActive: false,
   useBrowserMic: true,
 };
   if (getIngestWorkflow() === "select") panel?.classList.remove("hidden");
 }
+function lessonsEffectiveTopic() {
+  return effectiveTopic($("#lessons-topic")?.value || "");
 }
+function lessonsUseRag() {
+  return Boolean($("#lessons-use-rag")?.checked);
 }
+function lessonsLanguage() {
+  const select = $("#lessons-language");
+  if (!select) return "en";
+  if (select.value === "other") {
+    return ($("#lessons-other-lang")?.value.trim() || "en").toLowerCase();
+  }
+  return select.value || "en";
+}
+function lessonsCoachVariant() {
+  return $("#lessons-coach-variant")?.value || "tiny-aya-global";
+}
+function lessonsAutoSpeak() {
+  return Boolean($("#lessons-auto-speak")?.checked);
+}
+function lessonsHasVoiceOut(language) {
+  const code = (language || "en").split("-")[0];
+  return (state.voicePresets?.voice_languages || []).includes(code);
+}
+function chatMessageText(content) {
   if (content == null) return "";
   if (typeof content === "string") return content;
   if (Array.isArray(content)) {
   );
 }
+function chatMessageAudio(content) {
+  if (!Array.isArray(content)) return null;
+  const filePart = content.find((part) => part && typeof part === "object" && part.path);
+  return filePart?.path || null;
+}
+function applyLessonsIngestResult(data) {
+  $("#lessons-ingest-status").textContent = stripMd(data.status || "Ingest complete.");
   state.workspaceSessionId = data.session_id || state.workspaceSessionId;
   $("#workspace-session").value = state.workspaceSessionId;
   if (data.documents_html) {
   updateResearchRagBadge();
   updateResearchDocCount((data.documents || []).length);
   if (ingestSucceeded(data.status)) {
+    const rag = $("#lessons-use-rag");
+    if (rag) rag.checked = true;
   }
 }
+async function discoverLessonsSources() {
+  const topic = lessonsEffectiveTopic();
   if (!topic) {
+    showError("Set a lesson or workspace topic before discovering sources.");
     return;
   }
+  await withRegionLoading($(".lessons-rail-controls"), "Discovering sources…", async () => {
     const data = await callApi("discover_sources", [topic, state.workspaceSessionId]);
+    $("#lessons-ingest-status").textContent = stripMd(data.status || "Discovery complete.");
+    renderLessonsUrlChoices(data.urls || [], data.selected_urls || data.urls || []);
     if (data.session_id) {
       state.workspaceSessionId = data.session_id;
       $("#workspace-session").value = data.session_id;
   });
 }
+async function autoLessonsIngest() {
+  const topic = lessonsEffectiveTopic();
   if (!topic) {
+    showError("Set a lesson or workspace topic before auto-ingest.");
     return;
   }
+  await withRegionLoading($(".lessons-rail-controls"), "Auto-ingesting sources…", async () => {
     const data = await callApi("auto_search_ingest", [topic, state.workspaceSessionId]);
+    applyLessonsIngestResult(data);
+    state.lessonsDiscoveredUrls = [];
+    state.lessonsSelectedUrls = [];
+    renderLessonsUrlChoices([], []);
     await refreshWorkspaceSessions(state.workspaceSessionId);
   });
 }
+async function ingestLessonsSources() {
+  const topic = lessonsEffectiveTopic();
+  const pasted = $("#lessons-urls-text")?.value.trim() || "";
+  const selected = getSelectedDiscoveredUrls("#lessons-url-choices-list");
+  const files = $("#lessons-ingest-file")?.files;
   if (!pasted && !selected.length && !files?.length) {
     showError("Add URLs, select suggested sources, or upload a file — then ingest.");
     return;
   }
+  await withRegionLoading($(".lessons-rail-controls"), "Ingesting sources…", async () => {
     const paths = [];
     if (files?.length) {
       for (const file of files) {
       selected,
       paths,
     ]);
+    applyLessonsIngestResult(data);
+    if (pasted) $("#lessons-urls-text").value = "";
+    if (files?.length) $("#lessons-ingest-file").value = "";
     await refreshWorkspaceSessions(state.workspaceSessionId);
   });
 }
+function syncLessonsModeUi() {
   const placeholders = {
     explain: "e.g. How does finetuning differ from pretraining?",
     lesson: "What is the difference between pretraining and finetuning a small model?",
   };
+  const messageEl = $("#lessons-message");
+  if (messageEl) messageEl.placeholder = placeholders[state.lessonsMode] || placeholders.lesson;
 }
+function syncLessonsLanguageUi() {
+  const isOther = $("#lessons-language")?.value === "other";
+  $("#lessons-other-lang-wrap")?.classList.toggle("hidden", !isOther);
+  const lang = lessonsLanguage();
+  const note = state.voicePresets?.voiceout_note || "";
+  const voiceHint = lessonsHasVoiceOut(lang)
+    ? note
+    : "VoiceOut not available for this language — text replies only.";
+  const noteEl = $("#lessons-voiceout-note");
+  if (noteEl) noteEl.textContent = voiceHint;
+}
+function renderLessonsChat() {
+  const container = $("#lessons-chat-messages");
   if (!container) return;
   if (!state.history.length) {
     container.innerHTML =
+      '<p class="research-chat-empty">Choose a language, then type, speak, or upload audio to start your lesson.</p>';
     return;
   }
   const parts = [];
     if (item && typeof item === "object" && item.role) {
       const role = item.role === "user" ? "user" : "assistant";
       const label = role === "user" ? "You" : "Teacher";
+      let body = renderMarkdownLite(chatMessageText(item.content));
+      const audioPath = chatMessageAudio(item.content) || item.voiceout_path || null;
+      if (audioPath) {
+        body += `<audio class="chat-audio-inline" controls autoplay src="${fileUrl(audioPath)}"></audio>`;
+      }
       if (role === "assistant" && item.rag_references) {
+        body += `<div class="lessons-rag-refs">${renderMarkdownLite(item.rag_references)}</div>`;
       }
       parts.push(
         `<div class="research-chat-bubble research-chat-${role}"><div class="research-chat-role">${label}</div><div class="research-chat-body">${body}</div></div>`
   container.scrollTop = container.scrollHeight;
 }
+function renderLessonsUrlChoices(urls, selected) {
+  state.lessonsDiscoveredUrls = urls || [];
+  state.lessonsSelectedUrls = selected?.length ? selected : [...state.lessonsDiscoveredUrls];
   renderUrlChoices(
     urls,
     selected,
+    "#lessons-url-choices-list",
+    "#lessons-url-choices-panel",
+    { discovered: state.lessonsDiscoveredUrls, selected: state.lessonsSelectedUrls }
   );
 }
+function applyVoiceIngestResult(data) {
+  applyLessonsIngestResult(data);
+}
+async function discoverVoiceSources() {
+  return discoverLessonsSources();
+}
+async function autoVoiceIngest() {
+  return autoLessonsIngest();
+}
+async function ingestVoiceSources() {
+  return ingestLessonsSources();
+}
+function syncVoiceModeUi() {
+  syncLessonsModeUi();
+}
+function renderVoiceChat() {
+  renderLessonsChat();
+}
+function renderVoiceUrlChoices(urls, selected) {
+  renderLessonsUrlChoices(urls, selected);
+}
+function voiceMessageText(content) {
+  return chatMessageText(content);
+}
 function renderSlideUrlChoices(urls, selected) {
   state.slideDiscoveredUrls = urls || [];
   state.slideSelectedUrls = selected?.length ? selected : [...state.slideDiscoveredUrls];
   }
 }
+async function initLanguageLessons() {
   const data = await callApi("voice_presets", []);
   state.voicePresets = data;
+  const langSelect = $("#lessons-language");
   if (langSelect) {
+    const opts = (data.languages || [])
       .map((o) => `<option value="${o.value}">${o.label}</option>`)
       .join("");
+    langSelect.innerHTML = `${opts}<option value="other">Other (text only)</option>`;
     langSelect.value = data.default_language || "en";
   }
+  const coachEl = document.querySelector(".lessons-coach-model");
+  if (coachEl && data.coach_chain_labels?.length) {
+    const primary = data.coach_chain_labels[0];
+    const fallback = data.coach_chain_labels[1];
+    coachEl.textContent = fallback
+      ? `Coach: ${primary} (auto-fallback: ${fallback})`
+      : `Coach: ${primary}`;
   }
+  syncLessonsLanguageUi();
+}
+async function initVoicePresets() {
+  return initLanguageLessons();
 }
 async function initSettings() {
   updateResearchRagBadge();
   await refreshWorkspaceSessions();
   await refreshDocuments();
+  await initLanguageLessons();
   await initSettings();
+  syncLessonsModeUi();
+  renderLessonsChat();
   await refreshDebugDocuments();
   const recStatus = await callApi("recording_status", []);
   state.useBrowserMic = !recStatus.backend || /unavailable|no capture/i.test(recStatus.message || "");
   const topic = effectiveTopic($("#lesson-topic").value);
   const grade = $("#lesson-grade").value;
   const slideCount = Number($("#slide-count").value);
+  const useRag = Boolean($("#lessons-use-rag")?.checked);
   const docIds = effectiveDocIds([]);
   const sourceMode = $("#slide-source-mode")?.value || "";
   const searchWorkflow = $("#slide-search-workflow")?.value || "two_step";
   );
 }
+function renderLessonsReply(data) {
   state.history = data.history ?? state.history;
+  if (state.history.length) {
     const last = state.history[state.history.length - 1];
     if (last && typeof last === "object" && last.role === "assistant") {
+      if (data.rag_references) last.rag_references = data.rag_references;
+      if (data.voiceout_path && lessonsAutoSpeak()) last.voiceout_path = data.voiceout_path;
     }
   }
+  renderLessonsChat();
   if (data.status) {
+    const statusEl = $("#lessons-turn-status");
+    if (statusEl) statusEl.textContent = stripMd(data.status);
   }
 }
+function renderVoiceReply(data, options) {
+  renderLessonsReply(data, options);
 }
+async function sendLanguageLessonTurn({ message = "", audioPath = "" } = {}) {
+  const topic = lessonsEffectiveTopic();
+  const useRag = lessonsUseRag();
   const docIds = effectiveDocIds([]);
+  const language = lessonsLanguage();
   const asr = state.voicePresets?.default_asr || null;
+  const autoVoiceout = lessonsAutoSpeak() && lessonsHasVoiceOut(language);
+  const coachVariant = lessonsCoachVariant();
+  const loadingLabel = message || audioPath ? (message ? "Teacher is thinking…" : "Processing audio…") : "Sending…";
+  await withRegionLoading($(".lessons-main-card"), loadingLabel, async () => {
+    const data = await callApi("language_lesson_turn", [
+      message,
+      audioPath || "",
+      state.lessonsMode,
       topic,
       state.workspaceSessionId,
       useRag,
       docIds,
       language,
       asr,
+      autoVoiceout,
+      "",
+      coachVariant,
     ]);
+    if (data.user_text) {
+      $("#lessons-message").value = data.user_text;
+    } else if (message) {
+      $("#lessons-message").value = "";
+    }
+    renderLessonsReply(data);
   });
 }
+async function sendLessonsTurn() {
+  const message = $("#lessons-message")?.value.trim() || "";
+  let audioPath = state.pendingLessonsAudioPath;
+  const file = $("#lessons-audio-upload")?.files?.[0];
+  if (file) audioPath = await uploadFile(file);
+  if (message) {
+    await sendLanguageLessonTurn({ message });
+    state.pendingLessonsAudioPath = null;
+    return;
   }
+  if (audioPath) {
+    await sendLanguageLessonTurn({ audioPath });
+    state.pendingLessonsAudioPath = null;
+    if ($("#lessons-audio-upload")) $("#lessons-audio-upload").value = "";
+    return;
+  }
+  showError("Type a message, hold the mic, or upload audio.");
 }
+async function sendVoiceTurn() {
+  return sendLessonsTurn();
+}
+async function sendVoiceAudioTurn(audioPath) {
+  return sendLanguageLessonTurn({ audioPath });
+}
+async function clearLessonsConversation() {
   const data = await callApi("teacher_voice_clear", []);
   state.history = [];
+  renderLessonsChat();
+  if ($("#lessons-message")) $("#lessons-message").value = "";
+  const statusEl = $("#lessons-turn-status");
+  if (statusEl) statusEl.textContent = stripMd(data.status || "Conversation cleared.");
 }
+async function clearVoiceConversation() {
+  return clearLessonsConversation();
+}
+async function startLessonsHoldMic(e) {
+  if (state.holdMicActive) return;
+  state.holdMicActive = true;
+  e?.preventDefault();
+  const holdBtn = $("#btn-lessons-hold-mic");
+  holdBtn?.classList.add("recording");
+  await startRecording(
+    "lessons",
+    $("#lessons-record-status"),
+    $("#btn-lessons-record-start"),
+    $("#btn-lessons-record-stop")
+  );
+}
+async function stopLessonsHoldMic(e) {
+  if (!state.holdMicActive) return;
+  state.holdMicActive = false;
+  e?.preventDefault();
+  $("#btn-lessons-hold-mic")?.classList.remove("recording");
+  const path = await stopRecording(
+    $("#lessons-record-status"),
+    $("#btn-lessons-record-start"),
+    $("#btn-lessons-record-stop")
+  );
+  if (path) await sendLanguageLessonTurn({ audioPath: path });
+}
+async function sendLessonsFromRecording() {
+  let path = state.pendingLessonsAudioPath;
+  const file = $("#lessons-audio-upload")?.files?.[0];
   if (file) path = await uploadFile(file);
   if (!path) {
+    showError("Record or upload audio first.");
     return;
   }
+  await sendLanguageLessonTurn({ audioPath: path });
+  state.pendingLessonsAudioPath = null;
+}
+async function sendVoiceFromRecording() {
+  return sendLessonsFromRecording();
 }
 async function startBrowserRecording(statusEl) {
     path = data.path;
     if (statusEl) statusEl.textContent = stripMd(data.status || "Recording saved.");
   }
+  if (state.recordingTarget === "lessons") state.pendingLessonsAudioPath = path;
   state.recordingTarget = null;
   return path;
 }
 function bindUi() {
   $("#slide-count").addEventListener("input", (e) => {
     $("#slide-count-val").textContent = e.target.value;
   });
   $("#btn-generate").addEventListener("click", () => generateSlides().catch(() => {}));
+  $("#btn-lessons-send")?.addEventListener("click", () => sendLessonsTurn().catch(() => {}));
+  $("#lessons-message")?.addEventListener("keydown", (e) => {
+    if (e.key === "Enter" && !e.shiftKey) {
+      e.preventDefault();
+      sendLessonsTurn().catch(() => {});
+    }
+  });
+  $("#btn-lessons-discover")?.addEventListener("click", () => discoverLessonsSources().catch(() => {}));
+  $("#btn-lessons-auto-ingest")?.addEventListener("click", () => autoLessonsIngest().catch(() => {}));
+  $("#btn-lessons-ingest")?.addEventListener("click", () => ingestLessonsSources().catch(() => {}));
+  $("#lessons-ingest-file")?.addEventListener("change", () => ingestLessonsSources().catch(() => {}));
+  $("#btn-lessons-clear")?.addEventListener("click", () => clearLessonsConversation().catch(() => {}));
+  $("#lessons-language")?.addEventListener("change", syncLessonsLanguageUi);
+  $("#lessons-other-lang")?.addEventListener("input", syncLessonsLanguageUi);
+  $("#lessons-audio-upload")?.addEventListener("change", () => sendLessonsTurn().catch(() => {}));
+  const holdMic = $("#btn-lessons-hold-mic");
+  if (holdMic) {
+    holdMic.addEventListener("mousedown", (e) => startLessonsHoldMic(e).catch(() => {}));
+    holdMic.addEventListener("mouseup", (e) => stopLessonsHoldMic(e).catch(() => {}));
+    holdMic.addEventListener("mouseleave", (e) => {
+      if (state.holdMicActive) stopLessonsHoldMic(e).catch(() => {});
+    });
+    holdMic.addEventListener("touchstart", (e) => startLessonsHoldMic(e).catch(() => {}), { passive: false });
+    holdMic.addEventListener("touchend", (e) => stopLessonsHoldMic(e).catch(() => {}));
+  }
+  $("#btn-lessons-record-start")?.addEventListener("click", () =>
+    startRecording(
+      "lessons",
+      $("#lessons-record-status"),
+      $("#btn-lessons-record-start"),
+      $("#btn-lessons-record-stop")
+    ).catch(() => {})
+  );
+  $("#btn-lessons-record-stop")?.addEventListener("click", () =>
+    stopRecording(
+      $("#lessons-record-status"),
+      $("#btn-lessons-record-start"),
+      $("#btn-lessons-record-stop")
+    ).catch(() => {})
+  );
   $("#btn-debug-send").addEventListener("click", () => sendDebugMessage().catch(() => {}));
   $("#debug-session")?.addEventListener("change", () => refreshDebugDocuments().catch(() => {}));
   $("#debug-refresh-sessions")?.addEventListener("click", () => {
     refreshDebugSessions().catch(() => {});
     }
   });
   $("#btn-export").addEventListener("click", () => {
     const p = state.downloads?.pptx;
     if (p) window.open(fileUrl(p), "_blank");
     refreshDocuments().catch(() => {});
   });
+  document.querySelectorAll("#lessons-modes .mode-card").forEach((btn) => {
     btn.addEventListener("click", () => {
+      document.querySelectorAll("#lessons-modes .mode-card").forEach((b) => b.classList.remove("active"));
       btn.classList.add("active");
+      state.lessonsMode = btn.dataset.mode;
+      syncLessonsModeUi();
     });
   });
+  syncLessonsModeUi();
 }
 bindUi();

libs/echocoach/src/echocoach/config.py CHANGED Viewed

@@ -45,12 +45,23 @@ class EchoCoachConfig:
     tts_preset: str
     realtime_tts_preset: str | None
     coach_model: str
     max_seconds: int
     languages: list[LanguageOption]
     asr_presets: dict[str, AsrPreset]
     tts_presets: dict[str, TtsPreset]
     presets_path: Path | None = None
     def get_asr(self, key: str | None = None) -> AsrPreset:
         preset_key = key or self.asr_preset
         if preset_key not in self.asr_presets:
@@ -114,6 +125,7 @@ def _builtin_config() -> EchoCoachConfig:
         tts_preset="piper-multilingual",
         realtime_tts_preset=None,
         coach_model="minicpm5-1b",
         max_seconds=30,
         languages=langs,
         asr_presets=asr,
@@ -201,11 +213,15 @@ def load_echo_coach_config() -> EchoCoachConfig:
         if tts_default not in tts_presets:
             tts_default = next(iter(tts_presets))
         config = EchoCoachConfig(
             asr_preset=asr_default,
             tts_preset=tts_default,
             realtime_tts_preset=defaults.get("realtime_tts_preset"),
             coach_model=str(defaults.get("coach_model", "minicpm5-1b")),
             max_seconds=int(defaults.get("max_seconds", 30)),
             languages=languages,
             asr_presets=asr_presets,
@@ -222,6 +238,12 @@ def load_echo_coach_config() -> EchoCoachConfig:
         updates["realtime_tts_preset"] = os.environ["ECHOCOACH_REALTIME_TTS_PRESET"]
     if os.environ.get("ECHOCOACH_COACH_MODEL"):
         updates["coach_model"] = os.environ["ECHOCOACH_COACH_MODEL"]
     if os.environ.get("ECHOCOACH_MAX_SECONDS"):
         updates["max_seconds"] = int(os.environ["ECHOCOACH_MAX_SECONDS"])

     tts_preset: str
     realtime_tts_preset: str | None
     coach_model: str
+    coach_fallbacks: tuple[str, ...]
     max_seconds: int
     languages: list[LanguageOption]
     asr_presets: dict[str, AsrPreset]
     tts_presets: dict[str, TtsPreset]
     presets_path: Path | None = None
+    def coach_model_chain(self) -> list[str]:
+        """Primary coach preset followed by fallbacks (deduped, order preserved)."""
+        chain: list[str] = []
+        seen: set[str] = set()
+        for key in (self.coach_model, *self.coach_fallbacks):
+            if key and key not in seen:
+                seen.add(key)
+                chain.append(key)
+        return chain
     def get_asr(self, key: str | None = None) -> AsrPreset:
         preset_key = key or self.asr_preset
         if preset_key not in self.asr_presets:
         tts_preset="piper-multilingual",
         realtime_tts_preset=None,
         coach_model="minicpm5-1b",
+        coach_fallbacks=(),
         max_seconds=30,
         languages=langs,
         asr_presets=asr,
         if tts_default not in tts_presets:
             tts_default = next(iter(tts_presets))
+        raw_fallbacks = defaults.get("coach_fallbacks") or []
+        coach_fallbacks = tuple(str(item) for item in raw_fallbacks)
         config = EchoCoachConfig(
             asr_preset=asr_default,
             tts_preset=tts_default,
             realtime_tts_preset=defaults.get("realtime_tts_preset"),
             coach_model=str(defaults.get("coach_model", "minicpm5-1b")),
+            coach_fallbacks=coach_fallbacks,
             max_seconds=int(defaults.get("max_seconds", 30)),
             languages=languages,
             asr_presets=asr_presets,
         updates["realtime_tts_preset"] = os.environ["ECHOCOACH_REALTIME_TTS_PRESET"]
     if os.environ.get("ECHOCOACH_COACH_MODEL"):
         updates["coach_model"] = os.environ["ECHOCOACH_COACH_MODEL"]
+    if os.environ.get("ECHOCOACH_COACH_FALLBACK"):
+        updates["coach_fallbacks"] = tuple(
+            part.strip()
+            for part in os.environ["ECHOCOACH_COACH_FALLBACK"].split(",")
+            if part.strip()
+        )
     if os.environ.get("ECHOCOACH_MAX_SECONDS"):
         updates["max_seconds"] = int(os.environ["ECHOCOACH_MAX_SECONDS"])

libs/echocoach/src/echocoach/pipeline.py CHANGED Viewed

@@ -64,9 +64,14 @@ def run_echo_coach(
     transcript = asr.transcribe(str(clipped_path), language=language)
     trace.log_note("asr_complete", preset=asr_key, chars=len(transcript))
-    fillers = analyze_fillers(transcript)
     pace = analyze_pace(transcript, duration)
-    transcript_html = highlight_fillers_html(transcript, fillers)
     filler_chart, pace_chart = build_charts(
         transcript,

     transcript = asr.transcribe(str(clipped_path), language=language)
     trace.log_note("asr_complete", preset=asr_key, chars=len(transcript))
+    fillers = analyze_fillers(transcript) if language == "en" else FillerAnalysis(counts={}, spans=[], total=0)
     pace = analyze_pace(transcript, duration)
+    if language == "en":
+        transcript_html = highlight_fillers_html(transcript, fillers)
+    else:
+        import html
+        transcript_html = html.escape(transcript).replace("\n", "<br>")
     filler_chart, pace_chart = build_charts(
         transcript,

libs/echocoach/src/echocoach/prompts.py CHANGED Viewed

@@ -12,22 +12,49 @@ MODE_LABELS: dict[TeacherVoiceMode, str] = {
     "pitch": "Pitch practice",
 }
 EXPLAIN_SYSTEM = """You are TeacherVoice, a friendly tutor who explains ideas in plain language.
 Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
 numbered outlines, or phrases like "let me think" or "first I need to".
-Use simple examples when helpful. If the student asks in another language, reply in that language.
 When source excerpts are provided, ground your answer in them and cite with [1], [2], etc."""
 LESSON_SYSTEM = """You are TeacherVoice, a lesson-planning coach for teachers and students.
 Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
 or meta commentary about how you will answer.
 Help outline and explain lesson content verbally: learning goals, key points, and a simple flow.
-If a lesson topic is set, stay focused on it. When source excerpts are provided, use them and cite [1], [2], etc."""
 PITCH_SYSTEM = """You are TeacherVoice, a supportive public-speaking coach in a live conversation.
 Give brief, actionable feedback on what the student just said (opening, clarity, energy, structure).
 Do not produce JSON or long reports — speak naturally in 2-4 sentences.
-Suggest one concrete improvement for their next attempt. For charts and pace analysis, expand **Deep pitch analysis** below the chat."""
 _MODE_SYSTEM: dict[TeacherVoiceMode, str] = {
     "explain": EXPLAIN_SYSTEM,
@@ -36,8 +63,39 @@ _MODE_SYSTEM: dict[TeacherVoiceMode, str] = {
 }
-def system_prompt_for_mode(mode: TeacherVoiceMode) -> str:
-    return _MODE_SYSTEM[mode]
 def topic_context_block(topic: str | None, mode: TeacherVoiceMode) -> str | None:

     "pitch": "Pitch practice",
 }
+LANGUAGE_LESSON_MODES: frozenset[TeacherVoiceMode] = frozenset({"explain", "lesson"})
+# ISO 639-1 codes mapped to Tiny Aya regional presets (see Cohere Labs field guide).
+_AYA_FIRE_LANGS = frozenset({"hi", "bn", "ta", "te", "mr", "gu", "kn", "ml", "pa", "ur", "ne", "si"})
+_AYA_EARTH_LANGS = frozenset({"ar", "sw", "am", "ha", "fa", "he", "so", "yo", "ig", "zu", "af"})
+_AYA_WATER_LANGS = frozenset(
+    {"fr", "de", "es", "it", "pt", "nl", "pl", "el", "ja", "zh", "ko", "vi", "ru", "uk", "cs", "sv", "da", "fi", "no"}
+)
+_LANGUAGE_LABELS: dict[str, str] = {
+    "en": "English",
+    "fr": "French",
+    "de": "German",
+    "es": "Spanish",
+    "it": "Italian",
+    "pt": "Portuguese",
+    "nl": "Dutch",
+    "pl": "Polish",
+    "el": "Greek",
+    "ar": "Arabic",
+    "ja": "Japanese",
+    "zh": "Chinese",
+    "vi": "Vietnamese",
+    "ko": "Korean",
+}
 EXPLAIN_SYSTEM = """You are TeacherVoice, a friendly tutor who explains ideas in plain language.
 Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
 numbered outlines, or phrases like "let me think" or "first I need to".
+Use simple examples when helpful.
 When source excerpts are provided, ground your answer in them and cite with [1], [2], etc."""
 LESSON_SYSTEM = """You are TeacherVoice, a lesson-planning coach for teachers and students.
 Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
 or meta commentary about how you will answer.
 Help outline and explain lesson content verbally: learning goals, key points, and a simple flow.
+If a lesson topic is set, stay focused on it.
+When source excerpts are provided, use them and cite [1], [2], etc."""
 PITCH_SYSTEM = """You are TeacherVoice, a supportive public-speaking coach in a live conversation.
 Give brief, actionable feedback on what the student just said (opening, clarity, energy, structure).
 Do not produce JSON or long reports — speak naturally in 2-4 sentences.
+Suggest one concrete improvement for their next attempt. For charts and pace analysis, use Classic EchoCoach."""
 _MODE_SYSTEM: dict[TeacherVoiceMode, str] = {
     "explain": EXPLAIN_SYSTEM,
 }
+def language_label(language: str) -> str:
+    code = (language or "en").strip().lower().split("-")[0]
+    return _LANGUAGE_LABELS.get(code, code or "English")
+def language_instruction(language: str) -> str:
+    label = language_label(language)
+    return (
+        f"Target language: {label} ({language}). "
+        f"Reply ONLY in {label}. "
+        "If the student writes or speaks in another language, match their language instead."
+    )
+def resolve_aya_preset(language: str, variant: str = "auto") -> str:
+    """Return a models.yaml preset key for the Tiny Aya coach.
+    Regional Water/Fire/Earth presets remain in models.yaml for future use but
+    default to Global so Spaces only load one gated model.
+    """
+    _ = language  # language kept for API compatibility; Global handles 70+ langs
+    if variant and variant not in ("auto", ""):
+        if variant in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth"):
+            return "tiny-aya-global"
+        return variant
+    return "tiny-aya-global"
+def system_prompt_for_mode(mode: TeacherVoiceMode, *, language: str | None = None) -> str:
+    base = _MODE_SYSTEM[mode]
+    if language:
+        return f"{base}\n\n{language_instruction(language)}"
+    return base
 def topic_context_block(topic: str | None, mode: TeacherVoiceMode) -> str | None:

libs/echocoach/src/echocoach/teacher_voice.py CHANGED Viewed

@@ -168,6 +168,7 @@ def _rag_turn_via_agent(
     model_key: str,
     backend: InferenceBackend,
     trace: TraceRecorder,
 ) -> tuple[str, str | None, str | None, str]:
     """Grounded answer via ResearchMind harness. Returns text, refs, status, display."""
     query = retrieval_query(user_text, topic=topic)
@@ -205,6 +206,7 @@ def _rag_turn_via_agent(
         mode=mode,
         backend=backend,
         trace=trace,
     )
     rag_refs = result.references_markdown or None
     return assistant_text, rag_refs, rag_status, display_reply
@@ -237,13 +239,14 @@ def _compact_teacher_reply(
     mode: TeacherVoiceMode,
     backend: InferenceBackend,
     trace: TraceRecorder,
 ) -> str:
     seed = strip_reasoning_output(raw_reply).strip() or raw_reply.strip()[:1200]
     messages = [
         {
             "role": "system",
             "content": (
-                f"{system_prompt_for_mode(mode)}\n\n"
                 "Rewrite the draft below into ONLY 2-4 spoken sentences for voice playback. "
                 "Keep any [n] citations. No planning or labels."
             ),
@@ -263,6 +266,7 @@ def _finalize_voice_reply(
     mode: TeacherVoiceMode,
     backend: InferenceBackend,
     trace: TraceRecorder,
 ) -> tuple[str, str]:
     """Normalize model output into a complete spoken reply and chat display text."""
     assistant_text = strip_reasoning_output(raw_reply).strip()
@@ -278,6 +282,7 @@ def _finalize_voice_reply(
             mode=mode,
             backend=backend,
             trace=trace,
         )
     if not reply_ends_complete_sentence(assistant_text):
         assistant_text = _compact_teacher_reply(
@@ -285,6 +290,7 @@ def _finalize_voice_reply(
             mode=mode,
             backend=backend,
             trace=trace,
         )
     return assistant_text, assistant_text
@@ -296,8 +302,9 @@ def build_teacher_messages(
     user_text: str,
     topic: str | None = None,
     rag: RagContext | None = None,
 ) -> list[dict[str, str]]:
-    system = system_prompt_for_mode(mode)
     topic_line = topic_context_block(topic, mode)
     if topic_line:
         system = f"{system}\n\n{topic_line}"
@@ -330,6 +337,7 @@ def _generate_teacher_reply(
     session_id: str,
     doc_ids: list[str] | None,
     tts_key: str,
 ) -> TeacherVoiceTurnResult:
     rag_refs: str | None = None
     rag_status: str | None = None
@@ -344,6 +352,7 @@ def _generate_teacher_reply(
             model_key=model_key,
             backend=backend,
             trace=trace,
         )
     else:
         messages = build_teacher_messages(
@@ -351,6 +360,7 @@ def _generate_teacher_reply(
             history=history,
             user_text=user_text,
             topic=topic,
         )
         raw_reply = backend.chat(messages, max_tokens=512, temperature=0.2)
         assistant_text, display_reply = _finalize_voice_reply(
@@ -358,20 +368,25 @@ def _generate_teacher_reply(
             mode=mode,
             backend=backend,
             trace=trace,
         )
         trace.log_llm(messages[-1]["content"], raw_reply)
         if mode in RAG_MODES:
             rag_status = _rag_off_status(session_id, doc_ids)
-    voiceout_path, voiceout_first, voiceout_warning = synthesize_voice_reply(
-        strip_references_for_tts(assistant_text),
-        language=language,
-        tts_preset=tts_key,
-        chunk_first=True,
-        out_subdir="teacher_voice",
-    )
-    if voiceout_path:
-        trace.set_artifact(voiceout_path)
     new_history = append_chat_turn(
         history,
@@ -409,6 +424,7 @@ def run_teacher_voice_text_turn(
     use_rag: bool = False,
     session_id: str = "",
     doc_ids: list[str] | None = None,
 ) -> TeacherVoiceTurnResult:
     """Process a typed user message (skips ASR)."""
     user_text = user_text.strip()
@@ -451,6 +467,7 @@ def run_teacher_voice_text_turn(
         session_id=session_id,
         doc_ids=doc_ids,
         tts_key=tts_key,
     )
@@ -469,6 +486,7 @@ def run_teacher_voice_turn(
     session_id: str = "",
     doc_ids: list[str] | None = None,
     max_turn_seconds: int | None = None,
 ) -> TeacherVoiceTurnResult:
     if not audio_path:
         raise ValueError("No audio recording provided.")
@@ -512,7 +530,7 @@ def run_teacher_voice_turn(
     from echocoach.omni import is_omni_profile, try_omni_turn
     if is_omni_profile():
-        system = system_prompt_for_mode(mode)
         topic_line = topic_context_block(topic, mode)
         if topic_line:
             system = f"{system}\n\n{topic_line}"
@@ -559,4 +577,5 @@ def run_teacher_voice_turn(
         session_id=session_id,
         doc_ids=doc_ids,
         tts_key=tts_key,
     )

     model_key: str,
     backend: InferenceBackend,
     trace: TraceRecorder,
+    language: str = "en",
 ) -> tuple[str, str | None, str | None, str]:
     """Grounded answer via ResearchMind harness. Returns text, refs, status, display."""
     query = retrieval_query(user_text, topic=topic)
         mode=mode,
         backend=backend,
         trace=trace,
+        language=language,
     )
     rag_refs = result.references_markdown or None
     return assistant_text, rag_refs, rag_status, display_reply
     mode: TeacherVoiceMode,
     backend: InferenceBackend,
     trace: TraceRecorder,
+    language: str = "en",
 ) -> str:
     seed = strip_reasoning_output(raw_reply).strip() or raw_reply.strip()[:1200]
     messages = [
         {
             "role": "system",
             "content": (
+                f"{system_prompt_for_mode(mode, language=language)}\n\n"
                 "Rewrite the draft below into ONLY 2-4 spoken sentences for voice playback. "
                 "Keep any [n] citations. No planning or labels."
             ),
     mode: TeacherVoiceMode,
     backend: InferenceBackend,
     trace: TraceRecorder,
+    language: str = "en",
 ) -> tuple[str, str]:
     """Normalize model output into a complete spoken reply and chat display text."""
     assistant_text = strip_reasoning_output(raw_reply).strip()
             mode=mode,
             backend=backend,
             trace=trace,
+            language=language,
         )
     if not reply_ends_complete_sentence(assistant_text):
         assistant_text = _compact_teacher_reply(
             mode=mode,
             backend=backend,
             trace=trace,
+            language=language,
         )
     return assistant_text, assistant_text
     user_text: str,
     topic: str | None = None,
     rag: RagContext | None = None,
+    language: str = "en",
 ) -> list[dict[str, str]]:
+    system = system_prompt_for_mode(mode, language=language)
     topic_line = topic_context_block(topic, mode)
     if topic_line:
         system = f"{system}\n\n{topic_line}"
     session_id: str,
     doc_ids: list[str] | None,
     tts_key: str,
+    auto_voiceout: bool = True,
 ) -> TeacherVoiceTurnResult:
     rag_refs: str | None = None
     rag_status: str | None = None
             model_key=model_key,
             backend=backend,
             trace=trace,
+            language=language,
         )
     else:
         messages = build_teacher_messages(
             history=history,
             user_text=user_text,
             topic=topic,
+            language=language,
         )
         raw_reply = backend.chat(messages, max_tokens=512, temperature=0.2)
         assistant_text, display_reply = _finalize_voice_reply(
             mode=mode,
             backend=backend,
             trace=trace,
+            language=language,
         )
         trace.log_llm(messages[-1]["content"], raw_reply)
         if mode in RAG_MODES:
             rag_status = _rag_off_status(session_id, doc_ids)
+    voiceout_path: str | None = None
+    voiceout_first: str | None = None
+    voiceout_warning: str | None = None
+    if auto_voiceout:
+        voiceout_path, voiceout_first, voiceout_warning = synthesize_voice_reply(
+            strip_references_for_tts(assistant_text),
+            language=language,
+            tts_preset=tts_key,
+            chunk_first=True,
+            out_subdir="teacher_voice",
+        )
+        if voiceout_path:
+            trace.set_artifact(voiceout_path)
     new_history = append_chat_turn(
         history,
     use_rag: bool = False,
     session_id: str = "",
     doc_ids: list[str] | None = None,
+    auto_voiceout: bool = True,
 ) -> TeacherVoiceTurnResult:
     """Process a typed user message (skips ASR)."""
     user_text = user_text.strip()
         session_id=session_id,
         doc_ids=doc_ids,
         tts_key=tts_key,
+        auto_voiceout=auto_voiceout,
     )
     session_id: str = "",
     doc_ids: list[str] | None = None,
     max_turn_seconds: int | None = None,
+    auto_voiceout: bool = True,
 ) -> TeacherVoiceTurnResult:
     if not audio_path:
         raise ValueError("No audio recording provided.")
     from echocoach.omni import is_omni_profile, try_omni_turn
     if is_omni_profile():
+        system = system_prompt_for_mode(mode, language=language)
         topic_line = topic_context_block(topic, mode)
         if topic_line:
             system = f"{system}\n\n{topic_line}"
         session_id=session_id,
         doc_ids=doc_ids,
         tts_key=tts_key,
+        auto_voiceout=auto_voiceout,
     )

libs/echocoach/tests/test_teacher_voice.py CHANGED Viewed

@@ -7,7 +7,7 @@ import pytest
 import soundfile as sf
 from inference.response_clean import reply_ends_complete_sentence
-from echocoach.prompts import PITCH_SYSTEM, system_prompt_for_mode
 from echocoach.teacher_voice import (
     RagContext,
     append_chat_turn,
@@ -131,8 +131,43 @@ def test_build_teacher_messages_includes_topic_and_rag():
     assert "Reply now in 2-4 complete spoken sentences only" in messages[-1]["content"]
 def test_pitch_mode_system_prompt():
-    assert "Deep pitch analysis" in system_prompt_for_mode("pitch")
     assert PITCH_SYSTEM == system_prompt_for_mode("pitch")

 import soundfile as sf
 from inference.response_clean import reply_ends_complete_sentence
+from echocoach.prompts import PITCH_SYSTEM, resolve_aya_preset, system_prompt_for_mode
 from echocoach.teacher_voice import (
     RagContext,
     append_chat_turn,
     assert "Reply now in 2-4 complete spoken sentences only" in messages[-1]["content"]
+def test_coach_model_chain_dedupes():
+    from echocoach.config import EchoCoachConfig, LanguageOption
+    cfg = EchoCoachConfig(
+        asr_preset="whisper-cpp-tiny",
+        tts_preset="piper-multilingual",
+        realtime_tts_preset=None,
+        coach_model="tiny-aya-global",
+        coach_fallbacks=("minicpm5-1b", "tiny-aya-global"),
+        max_seconds=30,
+        languages=[LanguageOption("en", "English")],
+        asr_presets={},
+        tts_presets={},
+    )
+    assert cfg.coach_model_chain() == ["tiny-aya-global", "minicpm5-1b"]
+def test_resolve_aya_preset_uses_global_only():
+    assert resolve_aya_preset("fr", "auto") == "tiny-aya-global"
+    assert resolve_aya_preset("hi", "auto") == "tiny-aya-global"
+    assert resolve_aya_preset("en", "tiny-aya-water") == "tiny-aya-global"
+def test_build_teacher_messages_includes_language_instruction():
+    messages = build_teacher_messages(
+        mode="lesson",
+        history=[],
+        user_text="Explique le fine-tuning.",
+        topic="ML",
+        language="fr",
+    )
+    assert "Target language: French" in messages[0]["content"]
+    assert "Reply ONLY in French" in messages[0]["content"]
 def test_pitch_mode_system_prompt():
+    assert "public-speaking coach" in system_prompt_for_mode("pitch")
     assert PITCH_SYSTEM == system_prompt_for_mode("pitch")

models.yaml CHANGED Viewed

@@ -67,3 +67,27 @@ models:
     backend: transformers
     model_id: ./models/finetuned/minicpm5-1b-lora-merged
     trust_remote_code: true

     backend: transformers
     model_id: ./models/finetuned/minicpm5-1b-lora-merged
     trust_remote_code: true
+  tiny-aya-global:
+    label: Tiny Aya Global 3.3B (multilingual coach)
+    backend: transformers
+    model_id: CohereLabs/tiny-aya-global
+    trust_remote_code: true
+  tiny-aya-water:
+    label: Tiny Aya Water 3.3B (European / Asia-Pacific)
+    backend: transformers
+    model_id: CohereLabs/tiny-aya-water
+    trust_remote_code: true
+  tiny-aya-fire:
+    label: Tiny Aya Fire 3.3B (South Asian)
+    backend: transformers
+    model_id: CohereLabs/tiny-aya-fire
+    trust_remote_code: true
+  tiny-aya-earth:
+    label: Tiny Aya Earth 3.3B (West Asian / African)
+    backend: transformers
+    model_id: CohereLabs/tiny-aya-earth
+    trust_remote_code: true

voice_models.yaml CHANGED Viewed

@@ -2,11 +2,13 @@
 # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
 defaults:
-  asr_preset: whisper-cpp-tiny
   tts_preset: piper-multilingual
   # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
   realtime_tts_preset: vibevoice-realtime-0.5b
-  coach_model: minicpm5-1b
   max_seconds: 30
 languages:
@@ -75,7 +77,7 @@ tts:
       pt: pt_BR-faber-medium
       nl: nl_NL-mls-medium
       pl: pl_PL-darkman-medium
-      el: en_US-lessac-medium
       ar: ar_JO-kareem-medium
       ja: ja_JP-natsuki-medium
       zh: zh_CN-huayan-medium

 # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
 defaults:
+  asr_preset: cohere-transcribe
   tts_preset: piper-multilingual
   # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
   realtime_tts_preset: vibevoice-realtime-0.5b
+  coach_model: tiny-aya-global
+  coach_fallbacks:
+    - minicpm5-1b
   max_seconds: 30
 languages:
       pt: pt_BR-faber-medium
       nl: nl_NL-mls-medium
       pl: pl_PL-darkman-medium
+      el: el_GR-rapunzelina-low
       ar: ar_JO-kareem-medium
       ja: ja_JP-natsuki-medium
       zh: zh_CN-huayan-medium