MSG commited on
Commit
9939b9d
·
1 Parent(s): 196a48f

Feat/sunday sprint 1 (#14)

Browse files

* multilingual lessons

* language page wip

* language page wip lesson

* test teacher

* test teacher lessons language model

.cursor/plans/multilingual_coach_cohere_eed97371.plan.md ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Multilingual Coach Cohere
3
+ overview: Add a dedicated Studio tab — Language lessons — that unifies multilingual text chat, audio upload, and realtime-style voice in/out (Cohere Transcribe + Tiny Aya + streaming TTS) on one page, replacing the split Voice / pitch-analysis UX for the hackathon demo.
4
+ todos:
5
+ - id: aya-presets
6
+ content: Add tiny-aya-global/water/fire/earth to models.yaml; set voice_models.yaml coach_model default; verify TransformersBackend.chat()
7
+ status: completed
8
+ - id: locale-prompts
9
+ content: Add language-lesson system prompt + language_instruction() for lesson/explain modes; wire language into build_teacher_messages() and RAG path
10
+ status: completed
11
+ - id: language-lessons-page
12
+ content: "New Studio nav tab Language lessons: language selector, unified composer (text + mic + upload), chat with inline audio, auto VoiceOut via realtime TTS"
13
+ status: completed
14
+ - id: language-lessons-api
15
+ content: Extend teacher_voice_* API with auto_voiceout flag; reuse existing turn pipeline; optional speak-on-reply default for Language lessons view
16
+ status: completed
17
+ - id: cohere-space-defaults
18
+ content: "Document and set Space secrets: ECHOCOACH_ASR_PRESET=cohere-transcribe, ECHOCOACH_COACH_MODEL=tiny-aya-global, ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b"
19
+ status: completed
20
+ - id: echocoach-i18n-polish
21
+ content: Move Deep pitch analysis to collapsed Advanced or Classic-only; gate English-only filler metrics; fix el Piper voice mapping
22
+ status: completed
23
+ - id: demo-docs
24
+ content: "Update README judge script: single Language lessons tab demo (14-lang voice + 70-lang text); Cohere Labs partner narrative"
25
+ status: completed
26
+ isProject: false
27
+ ---
28
+
29
+ # Language lessons — one tab, text + audio + realtime voice (Cohere stack)
30
+
31
+ ## Goal
32
+
33
+ Replace the current split **Voice** experience (TeacherVoice chat + buried EchoCoach pitch panel) with **one primary Studio page: Language lessons** — a multilingual learning coach where the user can interact the same way throughout:
34
+
35
+ | Input | Output |
36
+ |-------|--------|
37
+ | **Text** — type a question or lesson prompt | **Text** — chat bubbles in target language |
38
+ | **Mic** — hold / push-to-talk recording | **Audio** — auto-play teacher reply (realtime TTS when available) |
39
+ | **Upload** — `.wav` / `.mp3` clip | **Optional** — replay last reply, toggle auto-speak |
40
+
41
+ Backend stays **turn-based** (speak → wait → hear reply), but the page should *feel* realtime: mic stops → transcript appears → first audio chunk plays quickly via VibeVoice Realtime, with Piper fallback.
42
+
43
+ Partner stack ([Cohere Labs guide](https://build-small-hackathon-field-guide.hf.space/partners/cohere)): **Cohere Transcribe** (speech in) + **Tiny Aya** (coach brain, 70 langs) + **Piper / VibeVoice** (speech out).
44
+
45
+ ---
46
+
47
+ ## What you already have (reuse, don’t rewrite)
48
+
49
+ | Building block | Location | Reuse for Language lessons |
50
+ |---|---|---|
51
+ | Multi-turn coach pipeline | [`libs/echocoach/src/echocoach/teacher_voice.py`](libs/echocoach/src/echocoach/teacher_voice.py) | Same `run_teacher_voice_turn` / `run_teacher_voice_text_turn` |
52
+ | Lesson + explain prompts | [`libs/echocoach/src/echocoach/prompts.py`](libs/echocoach/src/echocoach/prompts.py) | `lesson` + `explain` modes (drop pitch from this page) |
53
+ | 14-language ASR/TTS config | [`voice_models.yaml`](voice_models.yaml) | Language dropdown + Cohere ASR + Piper voices |
54
+ | Cohere Transcribe backend | [`libs/echocoach/src/echocoach/asr/cohere.py`](libs/echocoach/src/echocoach/asr/cohere.py) | Default ASR on Space |
55
+ | Streaming TTS | [`libs/echocoach/src/echocoach/tts/vibevoice.py`](libs/echocoach/src/echocoach/tts/vibevoice.py) + `voiceout.py` | `chunk_first=True` already used for TeacherVoice |
56
+ | Studio API | [`apps/gradio-space/src/gradio_space/api/studio.py`](apps/gradio-space/src/gradio_space/api/studio.py) | `teacher_voice_turn`, `teacher_voice_audio_turn`, `voice_presets` |
57
+ | RAG grounding | ResearchMind via `teacher_voice.py` | Optional “Answer from my sources” toggle |
58
+ | Recording helpers | [`studio.js`](apps/gradio-space/static/studio/studio.js) `recordingTarget`, mic start/stop | Extend for hold-to-talk on Language lessons page |
59
+
60
+ **Not on this page:** EchoCoach one-shot pitch JSON report → move to **Classic** `/classic` EchoCoach tab only, or a collapsed “Pitch analysis (advanced)” link so Language lessons stays focused on learning.
61
+
62
+ ---
63
+
64
+ ## Page design — Language lessons tab
65
+
66
+ ### Navigation
67
+
68
+ In [`apps/gradio-space/static/studio/index.html`](apps/gradio-space/static/studio/index.html):
69
+
70
+ - Add/rename sidebar item: **`Language lessons`** (`data-view="language-lessons"`) with icon `translate` or `school`.
71
+ - Demote current **Voice** nav (pitch + mixed modes) → remove from primary nav, or keep **Voice** as alias redirecting to Language lessons for one release cycle.
72
+ - Classic `/classic` keeps full TeacherVoice + EchoCoach tabs unchanged.
73
+
74
+ ### Layout (single page)
75
+
76
+ ```text
77
+ ┌─��───────────────────────────────────────────────────────────┐
78
+ │ Language lessons │
79
+ │ Learn in your language — text, voice, or upload audio │
80
+ ├──────────────┬──────────────────────────────────────────────┤
81
+ │ LEFT RAIL │ MAIN — conversation │
82
+ │ │ │
83
+ │ Target lang ▼│ [User bubble — text or transcript] │
84
+ │ Coach model │ [Teacher bubble — text + inline ▶ audio] │
85
+ │ (Aya Global)│ ... │
86
+ │ │ │
87
+ │ Lesson topic │ ── UNIFIED COMPOSER ────────────────────── │
88
+ │ │ [ Text area — always visible ] │
89
+ │ ☑ Use sources│ [ 🎤 Hold to speak ] [ 📎 Upload audio ] │
90
+ │ │ [ Send ] ☑ Auto-speak replies │
91
+ │ Add sources │ Status: Listening… / Transcribing… / … │
92
+ │ (details) │ │
93
+ └──────────────┴──────────────────────────────────────────────┘
94
+ ```
95
+
96
+ **Left rail controls**
97
+
98
+ - **Target language** — required; populated from `voice_presets.languages` (14 voice langs).
99
+ - **Coach variant** (optional Advanced): Auto regional → Tiny Aya Global / Water / Fire / Earth.
100
+ - **Lesson topic** — defaults to workspace topic; grounds lesson mode.
101
+ - **Use indexed sources** — same as current `#use-rag`; applies to explain + lesson.
102
+ - **Add sources** — reuse voice-rail ingest (discover, URLs, PDF) or link to Research view.
103
+
104
+ **Main conversation**
105
+
106
+ - Messages format: user shows typed text or “🎤 transcript”; assistant shows reply text + embedded `<audio controls autoplay>` when VoiceOut path returned.
107
+ - Empty state copy: “Choose a language, then type, speak, or upload audio to start your lesson.”
108
+
109
+ **Unified composer (one place for all input modes)**
110
+
111
+ 1. **Text** — textarea + **Send** → `teacher_voice_turn` with `mode=lesson` (default) or toggle **Explain** vs **Lesson coach** (two small pills, not three modes).
112
+ 2. **Mic** — **Hold to speak** (mousedown/touchstart → record, release → stop → auto `teacher_voice_audio_turn`). Reuse existing `recordingTarget` pattern; set `state.recordingTarget = "language-lessons"`.
113
+ 3. **Upload** — file input → preview waveform/name → **Send audio** or auto-send on select.
114
+ 4. **Auto-speak replies** — checkbox default **on**; passes through to API so server always synthesizes TTS (already default in pipeline when `synthesize_voice_reply` runs).
115
+
116
+ **Realtime voice output behavior**
117
+
118
+ - Use `ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b` for Language lessons page (14 langs experimental on VibeVoice; fallback to Piper per lang).
119
+ - Frontend: on response, `autoplay` first audio element; show “Speaking…” while playing.
120
+ - Honest scope: **not** full-duplex WebSocket; latency target is “release mic → hear teacher within ~1–3s on GPU” via chunked TTS already in `voiceout.py`.
121
+
122
+ **70-language text demo (no voice required)**
123
+
124
+ - Language dropdown includes **“Other (text only)”** free-text ISO/code field OR a second “LLM language” field for codes outside Piper set (e.g. `hi`, `sw`).
125
+ - Helper: “Voice in/out: 14 languages · Coach understands 70+ with Tiny Aya.”
126
+ - When language has no Piper voice, show text reply only + banner “VoiceOut not available for this language.”
127
+
128
+ ---
129
+
130
+ ## Target architecture
131
+
132
+ ```mermaid
133
+ flowchart TB
134
+ subgraph page [Language lessons page]
135
+ TextIn[Text composer]
136
+ MicIn[Hold-to-talk mic]
137
+ FileIn[Audio upload]
138
+ end
139
+
140
+ TextIn --> Turn[teacher_voice turn]
141
+ MicIn --> ASR[Cohere Transcribe 2B]
142
+ FileIn --> ASR
143
+ ASR --> Turn
144
+ Turn --> Aya[Tiny Aya Global or regional]
145
+ RAG[ResearchMind RAG] --> Aya
146
+ Aya --> Reply[Lesson reply text]
147
+ Reply --> TTS[VibeVoice Realtime or Piper]
148
+ TTS --> AutoPlay[Inline autoplay audio]
149
+ Reply --> Chat[Chat bubbles]
150
+ ```
151
+
152
+ ---
153
+
154
+ ## Gaps to close (updated)
155
+
156
+ 1. **No dedicated Language lessons view** — today everything lives under generic **Voice** with pitch mode + EchoCoach panel ([`index.html` L303–419](apps/gradio-space/static/studio/index.html)).
157
+ 2. **Language not wired in Studio JS** — hardcoded `default_language` in [`studio.js`](apps/gradio-space/static/studio/studio.js) (~L1187).
158
+ 3. **Split send paths** — “Send text” vs “Send voice turn” should become one flow with auto-routing by input type.
159
+ 4. **Manual replay buttons** — “Speak full reply” should be default-on for Language lessons; keep replay as secondary.
160
+ 5. **Coach LLM** — still MiniCPM5 1B; need Tiny Aya presets for multilingual quality.
161
+ 6. **Default ASR** — Whisper tiny, not Cohere Transcribe.
162
+ 7. **Pitch/EchoCoach clutter** — remove from primary Language lessons UX.
163
+
164
+ ---
165
+
166
+ ## Implementation plan
167
+
168
+ ### 1. Backend — Tiny Aya + locale prompts (unchanged core)
169
+
170
+ Add to [`models.yaml`](models.yaml):
171
+
172
+ | Preset | HF model_id |
173
+ |--------|-------------|
174
+ | `tiny-aya-global` | `CohereLabs/tiny-aya-global` |
175
+ | `tiny-aya-water` | `CohereLabs/tiny-aya-water` |
176
+ | `tiny-aya-fire` | `CohereLabs/tiny-aya-fire` |
177
+ | `tiny-aya-earth` | `CohereLabs/tiny-aya-earth` |
178
+
179
+ Set `voice_models.yaml` → `defaults.coach_model: tiny-aya-global`.
180
+
181
+ In [`prompts.py`](libs/echocoach/src/echocoach/prompts.py):
182
+
183
+ - Add `LANGUAGE_LESSON_SYSTEM` (or extend `LESSON_SYSTEM` / `EXPLAIN_SYSTEM`) with explicit target-language instruction.
184
+ - Add `language_instruction(language: str) -> str` injected in `build_teacher_messages()`.
185
+
186
+ Optional `resolve_aya_preset(language)` for Water/Fire/Earth when user picks “Auto regional”.
187
+
188
+ ### 2. Backend — Language lessons API surface
189
+
190
+ In [`studio.py`](apps/gradio-space/src/gradio_space/api/studio.py):
191
+
192
+ - Add thin wrapper `api_language_lesson_turn(...)` OR alias existing endpoints with fixed `mode` default `lesson`.
193
+ - Parameters: `message`, `audio_path`, `language`, `topic`, `use_rag`, `history`, `mode` (`lesson`|`explain`), `auto_voiceout=True`, `coach_model` optional override.
194
+ - Ensure `language` is always passed through to ASR + TTS + prompts (no default-only path from frontend).
195
+
196
+ Register in Studio HTML boot (`initLanguageLessons()` parallel to `initVoicePresets()`).
197
+
198
+ ### 3. Frontend — new Language lessons page
199
+
200
+ Files: [`studio_html.py`](apps/gradio-space/src/gradio_space/ui/studio_html.py) (fragment), [`index.html`](apps/gradio-space/static/studio/index.html), [`studio.js`](apps/gradio-space/static/studio/studio.js), [`studio.css`](apps/gradio-space/static/studio/studio.css).
201
+
202
+ - New `<section class="col col-studio" data-view-panel="language-lessons">` with layout above.
203
+ - JS module: `state.languageLesson = { language, mode, autoSpeak, history }`.
204
+ - Wire nav `data-view="language-lessons"` in existing view switcher.
205
+ - **Hold-to-talk**: pointerdown on `#btn-lesson-hold-mic` → start recording; pointerup → stop → `sendLanguageLessonAudioTurn(path)`.
206
+ - **Unified send**: if textarea non-empty → text turn; else if pending audio → audio turn.
207
+ - **Render**: extend chat renderer to show inline audio on assistant messages (reuse `renderVoiceReply` patterns).
208
+ - Remove pitch mode cards and `#voice-pitch-analysis` from this view (Classic EchoCoach tab remains).
209
+
210
+ ### 4. Space defaults (Cohere partner demo)
211
+
212
+ ```bash
213
+ ECHOCOACH_ASR_PRESET=cohere-transcribe
214
+ ECHOCOACH_COACH_MODEL=tiny-aya-global
215
+ ECHOCOACH_TTS_PRESET=piper-multilingual
216
+ ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
217
+ ```
218
+
219
+ Document in [`USAGE.md`](USAGE.md). GPU Space recommended.
220
+
221
+ ### 5. Polish & demote pitch analysis
222
+
223
+ - Gate English-only filler metrics in EchoCoach when `language != "en"`.
224
+ - Fix Greek Piper mapping (`el`) in `voice_models.yaml`.
225
+ - EchoCoach deep analysis: Classic tab only, or footer link “Practice a monologue (pitch metrics)” opening Classic.
226
+
227
+ ### 6. Demo script (single tab)
228
+
229
+ Update [`README.md`](README.md) / [`apps/gradio-space/README.md`](apps/gradio-space/README.md):
230
+
231
+ 1. Open **Language lessons**.
232
+ 2. Select **French** → hold mic → ask “Explique le fine-tuning en termes simples.” → hear Piper/VibeVoice reply.
233
+ 3. Switch to **Spanish**, type a follow-up question (text in, text + audio out).
234
+ 4. Select **Hindi** (text-only) → show Tiny Aya Fire-quality written lesson snippet.
235
+ 5. Toggle **Use sources** after ingesting one PDF in Research.
236
+
237
+ Badge line: **Cohere Labs** — Transcribe + Tiny Aya on one local Language lessons page.
238
+
239
+ ### 7. Tests
240
+
241
+ [`libs/echocoach/tests/test_teacher_voice.py`](libs/echocoach/tests/test_teacher_voice.py):
242
+
243
+ - `build_teacher_messages(..., language="fr")` contains French instruction.
244
+ - Optional: API contract test that `language` propagates to mock ASR call.
245
+
246
+ ---
247
+
248
+ ## What you do **not** need for hackathon MVP
249
+
250
+ - Full duplex / interruptible WebSocket conversation
251
+ - TTS for all 70 Tiny Aya languages
252
+ - Replacing ResearchMind embeddings with multilingual models
253
+ - Keeping pitch practice on the same page as Language lessons
254
+
255
+ ---
256
+
257
+ ## Risk notes
258
+
259
+ | Risk | Mitigation |
260
+ |------|------------|
261
+ | GPU RAM (Transcribe 2B + Aya 3.3B) | Sequential load on ZeroGPU; dev fallback whisper + Aya |
262
+ | VibeVoice lang coverage gaps | Piper fallback per `voice_models.yaml`; text-only banner |
263
+ | Hold-to-talk on mobile browsers | Push-to-talk fallback buttons (start/stop) |
264
+ | Scope creep from 3-mode Voice tab | Language lessons = **lesson + explain only** |
265
+
266
+ ---
267
+
268
+ ## Suggested execution order
269
+
270
+ 1. Tiny Aya presets + locale prompts (quality foundation)
271
+ 2. **Language lessons page** HTML/JS/CSS + unified composer
272
+ 3. Wire language + auto_voiceout through API
273
+ 4. Space env defaults (Cohere ASR + realtime TTS)
274
+ 5. Demote EchoCoach pitch from Studio; docs + demo script
.env.example CHANGED
@@ -52,11 +52,17 @@ ALLOW_MODEL_SWITCH=false
52
  # After training, point Gradio at the adapter preset:
53
  # ACTIVE_MODEL=minicpm5-1b-lesson-lora
54
 
55
- # --- EchoCoach (voice practice coach) ---
56
  # VOICE_PRESETS_PATH=./voice_models.yaml
57
- # ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
 
 
 
 
58
  # ECHOCOACH_TTS_PRESET=piper-multilingual
59
- # ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b # TeacherVoice VoiceOut (falls back to Piper)
 
 
60
  # ECHOCOACH_COACH_MODEL=minicpm5-1b
61
  # ECHOCOACH_MAX_SECONDS=30
62
  # ECHOCOACH_CAPTURE_DEVICE= # optional ALSA/PipeWire device (e.g. pipewire, alsa_input.pci-...)
 
52
  # After training, point Gradio at the adapter preset:
53
  # ACTIVE_MODEL=minicpm5-1b-lesson-lora
54
 
55
+ # --- EchoCoach / Language lessons (voice stack) ---
56
  # VOICE_PRESETS_PATH=./voice_models.yaml
57
+ # Recommended for Cohere Labs partner demo (GPU Space):
58
+ # ECHOCOACH_ASR_PRESET=cohere-transcribe
59
+ # ECHOCOACH_COACH_MODEL=tiny-aya-global
60
+ # Comma-separated preset keys from models.yaml if primary coach fails to load:
61
+ # ECHOCOACH_COACH_FALLBACK=minicpm5-1b
62
  # ECHOCOACH_TTS_PRESET=piper-multilingual
63
+ # ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
64
+ # Dev fallback (CPU):
65
+ # ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
66
  # ECHOCOACH_COACH_MODEL=minicpm5-1b
67
  # ECHOCOACH_MAX_SECONDS=30
68
  # ECHOCOACH_CAPTURE_DEVICE= # optional ALSA/PipeWire device (e.g. pipewire, alsa_input.pci-...)
README.md CHANGED
@@ -38,10 +38,10 @@ Open [http://localhost:7860](http://localhost:7860).
38
 
39
  ### Studio UI (Off Brand track)
40
 
41
- The default landing page is a **custom AI Studio workspace** at `/` — not default Gradio chrome. It uses **Gradio 6 Server mode** (`gradio.Server`): Material 3 layout, sidebar + three-column workspace (Research → Slides → Voice/Coach), and `@server.api` endpoints wired to the same Python backends as Classic.
42
 
43
- - **`/`** — Studio UI (ingest sources, generate slides, TeacherVoice, EchoCoach)
44
- - **`/classic`** — full Gradio Blocks app (all tabs, settings, Chat debug)
45
 
46
  See [apps/gradio-space/README.md](apps/gradio-space/README.md) for API names and a 2-minute judge demo script.
47
 
 
38
 
39
  ### Studio UI (Off Brand track)
40
 
41
+ The default landing page is a **custom AI Studio workspace** at `/` — not default Gradio chrome. It uses **Gradio 6 Server mode** (`gradio.Server`): Material 3 layout, sidebar + workspace (Research → Slides → Language lessons), and `@server.api` endpoints wired to the same Python backends as Classic.
42
 
43
+ - **`/`** — Studio UI (ingest sources, generate slides, **Language lessons** multilingual coach)
44
+ - **`/classic`** — full Gradio Blocks app (TeacherVoice, EchoCoach pitch analysis, settings, Chat debug)
45
 
46
  See [apps/gradio-space/README.md](apps/gradio-space/README.md) for API names and a 2-minute judge demo script.
47
 
USAGE.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
4
 
5
- The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **TeacherVoice** for spoken back-and-forth tutoring, **EchoCoach** for one-shot pitch analysis, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
6
 
7
  ## Prerequisites
8
 
@@ -115,10 +115,11 @@ Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:
115
 
116
  | Variable | Default | Description |
117
  | -------- | ------- | ----------- |
118
- | `ECHOCOACH_ASR_PRESET` | `whisper-cpp-tiny` | ASR preset key |
119
  | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
120
- | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | TeacherVoice streaming TTS (see below) |
121
- | `ECHOCOACH_COACH_MODEL` | `minicpm5-1b` | Text coach preset (from `models.yaml`) |
 
122
  | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
123
 
124
  **Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face — run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.
@@ -129,9 +130,39 @@ Smoke tests (analysis only, no GPU):
129
  bash scripts/echo_coach_smoke.sh
130
  ```
131
 
132
- ### TeacherVoicespoken conversation (turn-based)
133
 
134
- The **TeacherVoice** tab is a **multi-turn voice teacher** not full duplex like a phone call, but speak wait hear a reply repeat.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
  | Mode | Purpose |
137
  | ---- | ------- |
 
2
 
3
  How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
4
 
5
+ The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (Cohere Transcribe + Tiny Aya), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
6
 
7
  ## Prerequisites
8
 
 
115
 
116
  | Variable | Default | Description |
117
  | -------- | ------- | ----------- |
118
+ | `ECHOCOACH_ASR_PRESET` | `cohere-transcribe` | ASR preset key (Space demo); use `whisper-cpp-tiny` on CPU dev |
119
  | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
120
+ | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
121
+ | `ECHOCOACH_COACH_MODEL` | `tiny-aya-global` | Text coach preset (Tiny Aya; from `models.yaml`) |
122
+ | `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
123
  | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
124
 
125
  **Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face — run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.
 
130
  bash scripts/echo_coach_smoke.sh
131
  ```
132
 
133
+ ### Language lessons multilingual coach (Studio tab)
134
 
135
+ The **Language lessons** tab is the primary voice learning experience: one page for **text**, **hold-to-talk mic**, and **audio upload**, with optional auto VoiceOut on every reply.
136
+
137
+ | Input | Output |
138
+ | ----- | ------ |
139
+ | Type a question | Chat bubble in target language |
140
+ | Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
141
+ | **Other (text only)** language code | Tiny Aya written lesson (no Piper voice for unsupported codes) |
142
+
143
+ **Stack (Cohere Labs partner demo):** [Cohere Transcribe](https://huggingface.co/CohereLabs/c4ai-transcribe-v2) (14 voice langs) → [Tiny Aya Global / regional](https://huggingface.co/CohereLabs/tiny-aya-global) (70+ text langs) → Piper or VibeVoice Realtime for speech out.
144
+
145
+ Set Space secrets (GPU recommended):
146
+
147
+ ```bash
148
+ ECHOCOACH_ASR_PRESET=cohere-transcribe
149
+ ECHOCOACH_COACH_MODEL=tiny-aya-global
150
+ ECHOCOACH_TTS_PRESET=piper-multilingual
151
+ ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
152
+ ```
153
+
154
+ | Mode | Purpose |
155
+ | ---- | ------- |
156
+ | **Explain** | Tutor any topic in plain language |
157
+ | **Lesson coach** | Discuss and outline lesson content |
158
+
159
+ Turn-based (not full duplex): speak → wait → hear reply. **Auto-speak replies** synthesizes TTS each turn when the language has a Piper voice.
160
+
161
+ Pitch metrics and monologue analysis live in **Classic UI → EchoCoach** (`/classic`).
162
+
163
+ ### TeacherVoice — Classic UI (turn-based)
164
+
165
+ The **TeacherVoice** tab in `/classic` is the legacy multi-turn voice teacher — same pipeline as Language lessons, plus **Pitch practice** mode.
166
 
167
  | Mode | Purpose |
168
  | ---- | ------- |
apps/gradio-space/README.md CHANGED
@@ -33,8 +33,9 @@ This package uses **Gradio 6 Server mode** (`gradio.Server`):
33
 
34
  **Voice & coach**
35
 
 
36
  - `teacher_voice_turn`, `teacher_voice_audio_turn`, `teacher_voice_clear`, `teacher_voice_speak`
37
- - `load_sample_pitch`, `analyze_pitch` (language, ASR preset, `speak_rewrite`)
38
  - `recording_status`, `recording_start`, `recording_stop`
39
  - `voice_presets`
40
 
@@ -44,15 +45,23 @@ This package uses **Gradio 6 Server mode** (`gradio.Server`):
44
  - `debug_chat`
45
  - `save_upload`
46
 
47
- ## Demo script (judges)
 
 
48
 
49
  1. Open `/` — **Small Model Finetuning** project workspace
50
- 2. Paste a URL in Research → **Ingest URL** → documents appear with **RAG Active**
51
- 3. Center column **Generate Slides** slide preview canvas, thumbnail strip, and **Outline** panel
52
- 4. Optional: expand **Research sources** → Web search or RAG modes
53
- 5. Voice view text or **mic** full conversation thread + **Speak full reply**
54
- 6. Coach view → **Load sample clip** or record **Analyze pitch** (charts, transcript, VoiceOut)
55
- 7. Debug sidebar → RAG scope overrides, plain chat or corpus-grounded test with traces
56
- 8. Settings drawer model status / reload (Classic at `/classic` still available)
 
 
 
 
 
 
57
 
58
  Space card metadata lives in the [repository root README.md](../../README.md).
 
33
 
34
  **Voice & coach**
35
 
36
+ - `language_lesson_turn` — unified text/audio turn for Language lessons (mode, language, `auto_voiceout`, coach variant)
37
  - `teacher_voice_turn`, `teacher_voice_audio_turn`, `teacher_voice_clear`, `teacher_voice_speak`
38
+ - `load_sample_pitch`, `analyze_pitch` (Classic EchoCoach; language, ASR preset, `speak_rewrite`)
39
  - `recording_status`, `recording_start`, `recording_stop`
40
  - `voice_presets`
41
 
 
45
  - `debug_chat`
46
  - `save_upload`
47
 
48
+ ## Demo script (judges) — Language lessons + Cohere stack
49
+
50
+ **Badge line:** Cohere Labs — Transcribe + Tiny Aya on one local Language lessons page.
51
 
52
  1. Open `/` — **Small Model Finetuning** project workspace
53
+ 2. **Language lessons** tabselect **French** → hold mic ask *« Explique le fine-tuning en termes simples. »* → hear Piper/VibeVoice reply
54
+ 3. Switch to **Spanish**, type a follow-up (text in, text + audio out with **Auto-speak replies** on)
55
+ 4. Select **Other (text only)** → enter `hi` show Tiny Aya Fire-quality written lesson (text only banner)
56
+ 5. Toggle **Use indexed sources** after ingesting one PDF in **Research**
57
+ 6. Optional: **Generate Slides** from the Slides tab; **Classic UI** (`/classic`) for EchoCoach pitch metrics
58
+
59
+ Space secrets for GPU demo:
60
+
61
+ ```bash
62
+ ECHOCOACH_ASR_PRESET=cohere-transcribe
63
+ ECHOCOACH_COACH_MODEL=tiny-aya-global
64
+ ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
65
+ ```
66
 
67
  Space card metadata lives in the [repository root README.md](../../README.md).
apps/gradio-space/src/gradio_space/api/studio.py CHANGED
@@ -10,7 +10,7 @@ import gradio as gr
10
 
11
  from echocoach.config import get_echo_coach_config
12
  from echocoach.pipeline import run_echo_coach
13
- from echocoach.prompts import TeacherVoiceMode
14
  from echocoach.recording import (
15
  ServerRecordingError,
16
  recording_backend_status,
@@ -51,7 +51,7 @@ from gradio_space.ui.studio_html import (
51
  render_trace_details,
52
  )
53
  from gradio_space.voice_helpers import speak_last_assistant_reply
54
- from inference.config import get_app_config
55
  from inference.factory import get_backend
56
  from researchmind.config import get_config as get_research_config
57
  from researchmind.ingest import IngestPipeline
@@ -167,11 +167,93 @@ def _voice_stack_summary() -> str:
167
  f"ASR: {asr.label} ({_echo_config.asr_preset})",
168
  f"TTS: {tts.label} ({_echo_config.tts_preset})",
169
  f"Coach model: {_echo_config.coach_model}",
 
170
  f"Max recording: {_echo_config.max_seconds}s",
171
  ]
172
  return "\n".join(lines)
173
 
174
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  def _paths_summary() -> str:
176
  rm = get_research_config()
177
  lines = []
@@ -549,9 +631,15 @@ def api_teacher_voice_turn(
549
  doc_ids: list[str] | None = None,
550
  language: str = "en",
551
  asr_preset: str | None = None,
 
 
 
552
  ) -> dict[str, Any]:
553
- model_key = get_active_model_key()
554
- load_error = ensure_model_loaded(model_key)
 
 
 
555
  if load_error:
556
  return err(load_error)
557
 
@@ -567,9 +655,11 @@ def api_teacher_voice_turn(
567
  language=language,
568
  topic=topic.strip() or None,
569
  backend=get_backend(model_key),
 
570
  use_rag=use_rag and mode in RAG_MODES,
571
  session_id=session_id or None,
572
  doc_ids=doc_ids or None,
 
573
  )
574
  except Exception as exc: # noqa: BLE001
575
  return err(str(exc))
@@ -577,9 +667,12 @@ def api_teacher_voice_turn(
577
  return ok(
578
  history=result.history,
579
  assistant=result.assistant_text,
580
- status=result.rag_status or "Turn complete.",
581
  voiceout_path=result.voiceout_path,
 
582
  rag_references=result.rag_references,
 
 
583
  )
584
 
585
  def api_teacher_voice_audio_turn(
@@ -592,9 +685,15 @@ def api_teacher_voice_audio_turn(
592
  doc_ids: list[str] | None = None,
593
  language: str = "en",
594
  asr_preset: str | None = None,
 
 
 
595
  ) -> dict[str, Any]:
596
- model_key = get_active_model_key()
597
- load_error = ensure_model_loaded(model_key)
 
 
 
598
  if load_error:
599
  return err(load_error)
600
 
@@ -613,10 +712,12 @@ def api_teacher_voice_audio_turn(
613
  asr_preset=preset,
614
  topic=topic.strip() or None,
615
  backend=get_backend(model_key),
 
616
  use_rag=use_rag and mode in RAG_MODES,
617
  session_id=session_id or None,
618
  doc_ids=doc_ids or None,
619
  max_turn_seconds=max_turn,
 
620
  )
621
  except Exception as exc: # noqa: BLE001
622
  return err(str(exc))
@@ -624,10 +725,60 @@ def api_teacher_voice_audio_turn(
624
  return ok(
625
  history=result.history,
626
  assistant=result.assistant_text,
627
- status=result.rag_status or "Turn complete.",
628
  voiceout_path=result.voiceout_path,
 
629
  user_text=result.user_text,
630
  rag_references=result.rag_references,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
631
  )
632
 
633
 
@@ -672,8 +823,7 @@ def api_analyze_pitch(
672
  asr_preset: str | None = None,
673
  speak_rewrite: bool = False,
674
  ) -> dict[str, Any]:
675
- model_key = get_active_model_key()
676
- load_error = ensure_model_loaded(model_key)
677
  if load_error:
678
  return err(load_error)
679
 
@@ -686,6 +836,7 @@ def api_analyze_pitch(
686
  audio_path,
687
  language=language,
688
  asr_preset=preset,
 
689
  backend=get_backend(model_key),
690
  speak_rewrite=speak_rewrite,
691
  )
@@ -786,12 +937,30 @@ def api_recording_stop() -> dict[str, Any]:
786
 
787
 
788
  def api_voice_presets() -> dict[str, Any]:
 
 
 
 
 
789
  return ok(
790
  languages=[{"label": label, "value": value} for label, value in _echo_config.language_choices()],
791
  asr_presets=[{"label": label, "value": value} for label, value in _echo_config.asr_choices()],
 
 
 
792
  default_language=_echo_config.language_choices()[0][1] if _echo_config.language_choices() else "en",
793
  default_asr=_echo_config.asr_preset,
 
 
 
 
 
794
  max_seconds=_echo_config.max_seconds,
 
 
 
 
 
795
  )
796
 
797
 
@@ -917,6 +1086,38 @@ def register_studio_apis(server: gr.Server) -> None:
917
  file_paths,
918
  )
919
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
920
  @server.api(name="teacher_voice_turn")
921
  def _teacher_voice_turn(
922
  message: str,
@@ -928,6 +1129,9 @@ def register_studio_apis(server: gr.Server) -> None:
928
  doc_ids: list[str] | None = None,
929
  language: str = "en",
930
  asr_preset: str | None = None,
 
 
 
931
  ) -> dict[str, Any]:
932
  return api_teacher_voice_turn(
933
  message,
@@ -939,6 +1143,9 @@ def register_studio_apis(server: gr.Server) -> None:
939
  doc_ids,
940
  language,
941
  asr_preset,
 
 
 
942
  )
943
 
944
  @server.api(name="teacher_voice_audio_turn")
@@ -952,6 +1159,9 @@ def register_studio_apis(server: gr.Server) -> None:
952
  doc_ids: list[str] | None = None,
953
  language: str = "en",
954
  asr_preset: str | None = None,
 
 
 
955
  ) -> dict[str, Any]:
956
  return api_teacher_voice_audio_turn(
957
  audio_path,
@@ -963,6 +1173,9 @@ def register_studio_apis(server: gr.Server) -> None:
963
  doc_ids,
964
  language,
965
  asr_preset,
 
 
 
966
  )
967
 
968
  @server.api(name="teacher_voice_clear")
 
10
 
11
  from echocoach.config import get_echo_coach_config
12
  from echocoach.pipeline import run_echo_coach
13
+ from echocoach.prompts import TeacherVoiceMode, resolve_aya_preset
14
  from echocoach.recording import (
15
  ServerRecordingError,
16
  recording_backend_status,
 
51
  render_trace_details,
52
  )
53
  from gradio_space.voice_helpers import speak_last_assistant_reply
54
+ from inference.config import get_app_config, get_model_config
55
  from inference.factory import get_backend
56
  from researchmind.config import get_config as get_research_config
57
  from researchmind.ingest import IngestPipeline
 
167
  f"ASR: {asr.label} ({_echo_config.asr_preset})",
168
  f"TTS: {tts.label} ({_echo_config.tts_preset})",
169
  f"Coach model: {_echo_config.coach_model}",
170
+ f"Coach fallbacks: {', '.join(_echo_config.coach_fallbacks) or 'none'}",
171
  f"Max recording: {_echo_config.max_seconds}s",
172
  ]
173
  return "\n".join(lines)
174
 
175
 
176
+ def _coach_model_key(
177
+ coach_model: str | None = None,
178
+ *,
179
+ language: str = "en",
180
+ coach_variant: str = "auto",
181
+ ) -> str:
182
+ if coach_model and coach_model.strip():
183
+ key = coach_model.strip()
184
+ elif coach_variant and coach_variant not in ("auto", ""):
185
+ key = coach_variant.strip()
186
+ else:
187
+ key = resolve_aya_preset(language, coach_variant)
188
+ if key in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth", "auto"):
189
+ key = "tiny-aya-global"
190
+ return key
191
+
192
+
193
+ def _coach_model_label(model_key: str) -> str:
194
+ try:
195
+ return get_model_config(model_key).label
196
+ except Exception:
197
+ return model_key
198
+
199
+
200
+ def _coach_model_candidates(
201
+ coach_model: str | None = None,
202
+ *,
203
+ language: str = "en",
204
+ coach_variant: str = "auto",
205
+ ) -> list[str]:
206
+ if coach_model and coach_model.strip():
207
+ return [coach_model.strip()]
208
+ primary = _coach_model_key(None, language=language, coach_variant=coach_variant)
209
+ chain: list[str] = []
210
+ seen: set[str] = set()
211
+ for key in (primary, *_echo_config.coach_fallbacks):
212
+ if key and key not in seen:
213
+ seen.add(key)
214
+ chain.append(key)
215
+ return chain or [primary]
216
+
217
+
218
+ def _ensure_coach_loaded(
219
+ coach_model: str | None = None,
220
+ *,
221
+ language: str = "en",
222
+ coach_variant: str = "auto",
223
+ ) -> tuple[str, str | None, str | None]:
224
+ """Load the first coach preset that succeeds. Returns (key, error, fallback_note)."""
225
+ candidates = _coach_model_candidates(
226
+ coach_model,
227
+ language=language,
228
+ coach_variant=coach_variant,
229
+ )
230
+ errors: list[str] = []
231
+ for index, key in enumerate(candidates):
232
+ load_error = ensure_model_loaded(key)
233
+ if not load_error:
234
+ if index == 0:
235
+ return key, None, None
236
+ label = _coach_model_label(key)
237
+ note = (
238
+ f"Primary coach unavailable — using fallback **{label}** (`{key}`). "
239
+ "Replies still follow your target language via prompts."
240
+ )
241
+ return key, None, note
242
+ errors.append(load_error)
243
+ return candidates[-1], errors[-1], None
244
+
245
+
246
+ def _coach_turn_status(base: str | None, fallback_note: str | None) -> str:
247
+ status = (base or "Turn complete.").strip()
248
+ if fallback_note:
249
+ return f"{fallback_note} {status}".strip()
250
+ return status
251
+
252
+
253
+ def _voice_language_codes() -> list[str]:
254
+ return [code for _, code in _echo_config.language_choices()]
255
+
256
+
257
  def _paths_summary() -> str:
258
  rm = get_research_config()
259
  lines = []
 
631
  doc_ids: list[str] | None = None,
632
  language: str = "en",
633
  asr_preset: str | None = None,
634
+ auto_voiceout: bool = True,
635
+ coach_model: str = "",
636
+ coach_variant: str = "auto",
637
  ) -> dict[str, Any]:
638
+ model_key, load_error, fallback_note = _ensure_coach_loaded(
639
+ coach_model or None,
640
+ language=language,
641
+ coach_variant=coach_variant,
642
+ )
643
  if load_error:
644
  return err(load_error)
645
 
 
655
  language=language,
656
  topic=topic.strip() or None,
657
  backend=get_backend(model_key),
658
+ coach_model=model_key,
659
  use_rag=use_rag and mode in RAG_MODES,
660
  session_id=session_id or None,
661
  doc_ids=doc_ids or None,
662
+ auto_voiceout=auto_voiceout,
663
  )
664
  except Exception as exc: # noqa: BLE001
665
  return err(str(exc))
 
667
  return ok(
668
  history=result.history,
669
  assistant=result.assistant_text,
670
+ status=_coach_turn_status(result.rag_status, fallback_note),
671
  voiceout_path=result.voiceout_path,
672
+ voiceout_warning=result.voiceout_warning,
673
  rag_references=result.rag_references,
674
+ coach_model=model_key,
675
+ coach_fallback=bool(fallback_note),
676
  )
677
 
678
  def api_teacher_voice_audio_turn(
 
685
  doc_ids: list[str] | None = None,
686
  language: str = "en",
687
  asr_preset: str | None = None,
688
+ auto_voiceout: bool = True,
689
+ coach_model: str = "",
690
+ coach_variant: str = "auto",
691
  ) -> dict[str, Any]:
692
+ model_key, load_error, fallback_note = _ensure_coach_loaded(
693
+ coach_model or None,
694
+ language=language,
695
+ coach_variant=coach_variant,
696
+ )
697
  if load_error:
698
  return err(load_error)
699
 
 
712
  asr_preset=preset,
713
  topic=topic.strip() or None,
714
  backend=get_backend(model_key),
715
+ coach_model=model_key,
716
  use_rag=use_rag and mode in RAG_MODES,
717
  session_id=session_id or None,
718
  doc_ids=doc_ids or None,
719
  max_turn_seconds=max_turn,
720
+ auto_voiceout=auto_voiceout,
721
  )
722
  except Exception as exc: # noqa: BLE001
723
  return err(str(exc))
 
725
  return ok(
726
  history=result.history,
727
  assistant=result.assistant_text,
728
+ status=_coach_turn_status(result.rag_status, fallback_note),
729
  voiceout_path=result.voiceout_path,
730
+ voiceout_warning=result.voiceout_warning,
731
  user_text=result.user_text,
732
  rag_references=result.rag_references,
733
+ coach_model=model_key,
734
+ coach_fallback=bool(fallback_note),
735
+ )
736
+
737
+
738
+ def api_language_lesson_turn(
739
+ message: str = "",
740
+ audio_path: str = "",
741
+ mode: TeacherVoiceMode = "lesson",
742
+ topic: str = "",
743
+ session_id: str = "",
744
+ use_rag: bool = True,
745
+ history: list | None = None,
746
+ doc_ids: list[str] | None = None,
747
+ language: str = "en",
748
+ asr_preset: str | None = None,
749
+ auto_voiceout: bool = True,
750
+ coach_model: str = "",
751
+ coach_variant: str = "auto",
752
+ ) -> dict[str, Any]:
753
+ """Unified Language lessons turn — routes to text or audio pipeline."""
754
+ if audio_path and audio_path.strip():
755
+ return api_teacher_voice_audio_turn(
756
+ audio_path.strip(),
757
+ mode=mode,
758
+ topic=topic,
759
+ session_id=session_id,
760
+ use_rag=use_rag,
761
+ history=history,
762
+ doc_ids=doc_ids,
763
+ language=language,
764
+ asr_preset=asr_preset,
765
+ auto_voiceout=auto_voiceout,
766
+ coach_model=coach_model,
767
+ coach_variant=coach_variant,
768
+ )
769
+ return api_teacher_voice_turn(
770
+ message,
771
+ mode=mode,
772
+ topic=topic,
773
+ session_id=session_id,
774
+ use_rag=use_rag,
775
+ history=history,
776
+ doc_ids=doc_ids,
777
+ language=language,
778
+ asr_preset=asr_preset,
779
+ auto_voiceout=auto_voiceout,
780
+ coach_model=coach_model,
781
+ coach_variant=coach_variant,
782
  )
783
 
784
 
 
823
  asr_preset: str | None = None,
824
  speak_rewrite: bool = False,
825
  ) -> dict[str, Any]:
826
+ model_key, load_error, _fallback_note = _ensure_coach_loaded(None, language=language)
 
827
  if load_error:
828
  return err(load_error)
829
 
 
836
  audio_path,
837
  language=language,
838
  asr_preset=preset,
839
+ coach_model=model_key,
840
  backend=get_backend(model_key),
841
  speak_rewrite=speak_rewrite,
842
  )
 
937
 
938
 
939
  def api_voice_presets() -> dict[str, Any]:
940
+ tts = _echo_config.get_tts()
941
+ voice_langs = _voice_language_codes()
942
+ coach_chain = _echo_config.coach_model_chain()
943
+ coach_chain_labels = [_coach_model_label(key) for key in coach_chain]
944
+ fallback_label = coach_chain_labels[1] if len(coach_chain_labels) > 1 else None
945
  return ok(
946
  languages=[{"label": label, "value": value} for label, value in _echo_config.language_choices()],
947
  asr_presets=[{"label": label, "value": value} for label, value in _echo_config.asr_choices()],
948
+ coach_variants=[
949
+ {"label": "Tiny Aya Global (70+ languages)", "value": "tiny-aya-global"},
950
+ ],
951
  default_language=_echo_config.language_choices()[0][1] if _echo_config.language_choices() else "en",
952
  default_asr=_echo_config.asr_preset,
953
+ default_coach=_echo_config.coach_model,
954
+ coach_fallbacks=list(_echo_config.coach_fallbacks),
955
+ coach_chain=coach_chain,
956
+ coach_chain_labels=coach_chain_labels,
957
+ voice_languages=voice_langs,
958
  max_seconds=_echo_config.max_seconds,
959
+ voiceout_note=(
960
+ f"Voice in/out: {len(voice_langs)} languages via Piper · "
961
+ f"Coach: {coach_chain_labels[0]}"
962
+ + (f" (fallback: {fallback_label})" if fallback_label else "")
963
+ ),
964
  )
965
 
966
 
 
1086
  file_paths,
1087
  )
1088
 
1089
+ @server.api(name="language_lesson_turn")
1090
+ def _language_lesson_turn(
1091
+ message: str = "",
1092
+ audio_path: str = "",
1093
+ mode: Literal["explain", "lesson"] = "lesson",
1094
+ topic: str = "",
1095
+ session_id: str = "",
1096
+ use_rag: bool = True,
1097
+ history: list | None = None,
1098
+ doc_ids: list[str] | None = None,
1099
+ language: str = "en",
1100
+ asr_preset: str | None = None,
1101
+ auto_voiceout: bool = True,
1102
+ coach_model: str = "",
1103
+ coach_variant: str = "auto",
1104
+ ) -> dict[str, Any]:
1105
+ return api_language_lesson_turn(
1106
+ message,
1107
+ audio_path,
1108
+ mode,
1109
+ topic,
1110
+ session_id,
1111
+ use_rag,
1112
+ history,
1113
+ doc_ids,
1114
+ language,
1115
+ asr_preset,
1116
+ auto_voiceout,
1117
+ coach_model,
1118
+ coach_variant,
1119
+ )
1120
+
1121
  @server.api(name="teacher_voice_turn")
1122
  def _teacher_voice_turn(
1123
  message: str,
 
1129
  doc_ids: list[str] | None = None,
1130
  language: str = "en",
1131
  asr_preset: str | None = None,
1132
+ auto_voiceout: bool = True,
1133
+ coach_model: str = "",
1134
+ coach_variant: str = "auto",
1135
  ) -> dict[str, Any]:
1136
  return api_teacher_voice_turn(
1137
  message,
 
1143
  doc_ids,
1144
  language,
1145
  asr_preset,
1146
+ auto_voiceout,
1147
+ coach_model,
1148
+ coach_variant,
1149
  )
1150
 
1151
  @server.api(name="teacher_voice_audio_turn")
 
1159
  doc_ids: list[str] | None = None,
1160
  language: str = "en",
1161
  asr_preset: str | None = None,
1162
+ auto_voiceout: bool = True,
1163
+ coach_model: str = "",
1164
+ coach_variant: str = "auto",
1165
  ) -> dict[str, Any]:
1166
  return api_teacher_voice_audio_turn(
1167
  audio_path,
 
1173
  doc_ids,
1174
  language,
1175
  asr_preset,
1176
+ auto_voiceout,
1177
+ coach_model,
1178
+ coach_variant,
1179
  )
1180
 
1181
  @server.api(name="teacher_voice_clear")
apps/gradio-space/static/studio/index.html CHANGED
@@ -34,7 +34,7 @@
34
  <nav class="sidebar-nav">
35
  <button type="button" class="nav-item" data-view="research"><span class="material-symbols-outlined">search</span>Research</button>
36
  <button type="button" class="nav-item active" data-view="slides"><span class="material-symbols-outlined">present_to_all</span>Slides</button>
37
- <button type="button" class="nav-item" data-view="voice"><span class="material-symbols-outlined">mic</span>Voice</button>
38
  <button type="button" class="nav-item" data-view="debug"><span class="material-symbols-outlined">bug_report</span>Debug</button>
39
  <button type="button" id="btn-open-settings" class="nav-item"><span class="material-symbols-outlined">settings</span>Settings</button>
40
  <a href="/classic" class="nav-item nav-link"><span class="material-symbols-outlined">open_in_new</span>Classic UI</a>
@@ -300,125 +300,104 @@
300
  </section>
301
 
302
  <section class="col col-studio">
303
- <div class="voice-layout view-voice-only">
304
- <aside class="voice-rail">
305
- <div class="card voice-rag-card">
306
- <p class="card-title">RAG Scope</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
307
  <label class="toggle-row">
308
  <span>Answer from my indexed sources</span>
309
- <input id="use-rag" type="checkbox" checked />
310
  </label>
311
- <p class="status-text">Ground teacher replies in your workspace documents when enabled.</p>
312
  </div>
313
- <div class="card voice-rail-controls">
314
  <p class="card-title">Mode</p>
315
- <div class="mode-cards voice-mode-cards" id="voice-modes">
316
  <button type="button" class="mode-card" data-mode="explain">Explain</button>
317
- <button type="button" class="mode-card active" data-mode="lesson">Lesson</button>
318
- <button type="button" class="mode-card" data-mode="pitch">Practice</button>
319
  </div>
320
- <label class="field voice-topic-wrap" id="voice-topic-wrap">
321
- <span>Focus topic</span>
322
- <input id="voice-topic" type="text" class="input" placeholder="Uses workspace topic when empty" />
323
  </label>
324
- <details class="voice-rag-sources" id="voice-rag-sources">
325
  <summary>Add sources (optional)</summary>
326
  <p class="status-text">Discover or ingest sources to ground answers in your library.</p>
327
  <div class="ingest-action-row">
328
- <button type="button" id="btn-voice-discover" class="btn btn-secondary">Discover on web</button>
329
- <button type="button" id="btn-voice-auto-ingest" class="btn btn-secondary">Auto-ingest</button>
330
  </div>
331
- <div id="voice-url-choices-panel" class="url-choices-panel hidden">
332
- <div id="voice-url-choices-list" class="url-choices-list"></div>
333
  </div>
334
  <label class="field">
335
  <span>Paste URLs (one per line)</span>
336
- <textarea id="voice-urls-text" class="input" rows="2" placeholder="https://…"></textarea>
337
  </label>
338
  <label class="upload-zone upload-zone-compact">
339
- <input id="voice-ingest-file" type="file" accept=".pdf,.docx" multiple hidden />
340
  <span class="material-symbols-outlined">upload_file</span>
341
  <span>Upload PDF or Doc</span>
342
  </label>
343
- <button type="button" id="btn-voice-ingest" class="btn btn-secondary btn-block">Ingest sources</button>
344
- <p id="voice-ingest-status" class="status-text"></p>
345
  </details>
346
  </div>
347
  </aside>
348
- <div class="voice-main">
349
- <div class="card voice-main-card">
350
- <div class="voice-card-head">
351
- <h2 class="section-label">Teacher Voice</h2>
352
- <p class="voice-card-desc">Talk with the teacher using text or voice grounded in your sources when RAG is on.</p>
353
  </div>
354
- <div id="voice-chat-messages" class="research-chat-messages voice-chat-messages">
355
- <p class="research-chat-empty">Type a message or record audio, then send.</p>
356
  </div>
357
- <div class="voice-compose" id="voice-panel">
358
  <label class="field">
359
- <span>Ask the teacher</span>
360
- <textarea id="voice-message" class="input" rows="2" placeholder="What is the difference between pretraining and finetuning a small model?"></textarea>
361
  </label>
362
- <div class="voice-input-toolbar">
363
- <div class="recording-row voice-recording-row">
364
- <button type="button" id="btn-voice-record-start" class="btn btn-secondary">Start mic</button>
365
- <button type="button" id="btn-voice-record-stop" class="btn btn-secondary" disabled>Stop mic</button>
366
- <input id="voice-audio-upload" type="file" accept="audio/*" class="input input-compact" />
367
- </div>
368
- <p id="voice-record-status" class="status-text voice-record-status"></p>
369
- </div>
370
- <div class="voice-send-row">
371
- <button type="button" id="btn-voice-send" class="btn btn-secondary">Send text</button>
372
- <button type="button" id="btn-voice-audio-send" class="btn btn-primary">Send voice turn</button>
373
- </div>
374
- <p id="voice-turn-status" class="status-text"></p>
375
- <div class="voice-replay-row">
376
- <button type="button" id="btn-voice-speak-full" class="btn btn-secondary">Speak full reply</button>
377
- <button type="button" id="btn-voice-speak-quick" class="btn btn-secondary">Speak first sentence</button>
378
- <button type="button" id="btn-voice-clear" class="btn btn-ghost">Clear conversation</button>
379
- </div>
380
- <div id="voice-audio-out" class="voice-audio-out"></div>
381
- </div>
382
- </div>
383
- <details class="card voice-pitch-analysis hidden" id="voice-pitch-analysis" open>
384
- <summary class="voice-pitch-summary">
385
- <span class="section-label">Deep pitch analysis</span>
386
- <span class="voice-pitch-summary-hint">Pace, fillers, charts, and spoken rewrite</span>
387
- </summary>
388
- <div class="coach-panel-wrap">
389
- <p class="coach-card-desc">Record or upload a short monologue (up to 30s), then analyze for metrics and feedback.</p>
390
- <div class="coach-capture-row">
391
- <div class="coach-capture-controls">
392
- <div class="recording-row coach-recording-row">
393
- <button type="button" id="btn-coach-record-start" class="btn btn-secondary">Start mic</button>
394
- <button type="button" id="btn-coach-record-stop" class="btn btn-secondary" disabled>Stop mic</button>
395
- <button type="button" id="btn-coach-sample" class="btn btn-ghost">Load sample</button>
396
- </div>
397
- <p id="coach-record-status" class="status-text coach-record-status"></p>
398
- </div>
399
- <label class="field coach-upload-field">
400
- <span>Upload pitch (WAV)</span>
401
- <input id="coach-audio" type="file" accept="audio/*" />
402
  </label>
403
  </div>
404
- <div class="controls-grid coach-presets">
405
- <label class="field">
406
- <span>Language</span>
407
- <select id="coach-language" class="input"></select>
408
- </label>
409
- <label class="field">
410
- <span>ASR preset</span>
411
- <select id="coach-asr" class="input"></select>
412
  </label>
 
413
  </div>
414
- <label class="toggle-row coach-voiceout-toggle">
415
- <span>Speak full rewrite (VoiceOut)</span>
416
- <input id="coach-speak-rewrite" type="checkbox" />
417
- </label>
418
- <button type="button" id="btn-analyze" class="btn btn-primary btn-block coach-analyze-btn">Analyze pitch</button>
419
- <div id="coach-panel" class="coach-results-panel"></div>
420
  </div>
421
- </details>
 
 
 
 
422
  </div>
423
  </div>
424
  </section>
 
34
  <nav class="sidebar-nav">
35
  <button type="button" class="nav-item" data-view="research"><span class="material-symbols-outlined">search</span>Research</button>
36
  <button type="button" class="nav-item active" data-view="slides"><span class="material-symbols-outlined">present_to_all</span>Slides</button>
37
+ <button type="button" class="nav-item" data-view="language-lessons"><span class="material-symbols-outlined">translate</span>Language lessons</button>
38
  <button type="button" class="nav-item" data-view="debug"><span class="material-symbols-outlined">bug_report</span>Debug</button>
39
  <button type="button" id="btn-open-settings" class="nav-item"><span class="material-symbols-outlined">settings</span>Settings</button>
40
  <a href="/classic" class="nav-item nav-link"><span class="material-symbols-outlined">open_in_new</span>Classic UI</a>
 
300
  </section>
301
 
302
  <section class="col col-studio">
303
+ <div class="lessons-layout view-lessons-only">
304
+ <aside class="lessons-rail">
305
+ <div class="card lessons-rail-card">
306
+ <p class="card-title">Target language</p>
307
+ <label class="field">
308
+ <span>Lesson language</span>
309
+ <select id="lessons-language" class="input"></select>
310
+ </label>
311
+ <label class="field lessons-other-lang hidden" id="lessons-other-lang-wrap">
312
+ <span>Text-only language code</span>
313
+ <input id="lessons-other-lang" type="text" class="input" placeholder="e.g. hi, sw" maxlength="8" />
314
+ </label>
315
+ <p id="lessons-voiceout-note" class="status-text"></p>
316
+ <p class="status-text lessons-coach-model">Coach: Tiny Aya Global (70+ languages)</p>
317
+ <input type="hidden" id="lessons-coach-variant" value="tiny-aya-global" />
318
+ </div>
319
+ <div class="card lessons-rag-card">
320
+ <p class="card-title">RAG scope</p>
321
  <label class="toggle-row">
322
  <span>Answer from my indexed sources</span>
323
+ <input id="lessons-use-rag" type="checkbox" checked />
324
  </label>
325
+ <p class="status-text">Ground lesson replies in your workspace documents when enabled.</p>
326
  </div>
327
+ <div class="card lessons-rail-controls">
328
  <p class="card-title">Mode</p>
329
+ <div class="mode-cards lessons-mode-cards" id="lessons-modes">
330
  <button type="button" class="mode-card" data-mode="explain">Explain</button>
331
+ <button type="button" class="mode-card active" data-mode="lesson">Lesson coach</button>
 
332
  </div>
333
+ <label class="field lessons-topic-wrap">
334
+ <span>Lesson topic</span>
335
+ <input id="lessons-topic" type="text" class="input" placeholder="Uses workspace topic when empty" />
336
  </label>
337
+ <details class="lessons-rag-sources" id="lessons-rag-sources">
338
  <summary>Add sources (optional)</summary>
339
  <p class="status-text">Discover or ingest sources to ground answers in your library.</p>
340
  <div class="ingest-action-row">
341
+ <button type="button" id="btn-lessons-discover" class="btn btn-secondary">Discover on web</button>
342
+ <button type="button" id="btn-lessons-auto-ingest" class="btn btn-secondary">Auto-ingest</button>
343
  </div>
344
+ <div id="lessons-url-choices-panel" class="url-choices-panel hidden">
345
+ <div id="lessons-url-choices-list" class="url-choices-list"></div>
346
  </div>
347
  <label class="field">
348
  <span>Paste URLs (one per line)</span>
349
+ <textarea id="lessons-urls-text" class="input" rows="2" placeholder="https://…"></textarea>
350
  </label>
351
  <label class="upload-zone upload-zone-compact">
352
+ <input id="lessons-ingest-file" type="file" accept=".pdf,.docx" multiple hidden />
353
  <span class="material-symbols-outlined">upload_file</span>
354
  <span>Upload PDF or Doc</span>
355
  </label>
356
+ <button type="button" id="btn-lessons-ingest" class="btn btn-secondary btn-block">Ingest sources</button>
357
+ <p id="lessons-ingest-status" class="status-text"></p>
358
  </details>
359
  </div>
360
  </aside>
361
+ <div class="lessons-main">
362
+ <div class="card lessons-main-card">
363
+ <div class="lessons-card-head">
364
+ <h2 class="section-label">Language lessons</h2>
365
+ <p class="lessons-card-desc">Learn in your language type, hold the mic, or upload audio. Replies can speak back automatically.</p>
366
  </div>
367
+ <div id="lessons-chat-messages" class="research-chat-messages lessons-chat-messages">
368
+ <p class="research-chat-empty">Choose a language, then type, speak, or upload audio to start your lesson.</p>
369
  </div>
370
+ <div class="lessons-compose" id="lessons-panel">
371
  <label class="field">
372
+ <span>Your message</span>
373
+ <textarea id="lessons-message" class="input" rows="2" placeholder="What is the difference between pretraining and finetuning a small model?"></textarea>
374
  </label>
375
+ <div class="lessons-input-toolbar">
376
+ <button type="button" id="btn-lessons-hold-mic" class="btn btn-secondary lessons-hold-mic">Hold to speak</button>
377
+ <button type="button" id="btn-lessons-record-start" class="btn btn-secondary btn-compact">Start mic</button>
378
+ <button type="button" id="btn-lessons-record-stop" class="btn btn-secondary btn-compact" disabled>Stop mic</button>
379
+ <label class="lessons-upload-btn btn btn-secondary">
380
+ <span class="material-symbols-outlined">upload_file</span>
381
+ Upload audio
382
+ <input id="lessons-audio-upload" type="file" accept="audio/*" hidden />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
383
  </label>
384
  </div>
385
+ <p id="lessons-record-status" class="status-text lessons-record-status"></p>
386
+ <div class="lessons-send-row">
387
+ <button type="button" id="btn-lessons-send" class="btn btn-primary">Send</button>
388
+ <label class="toggle-row lessons-auto-speak">
389
+ <span>Auto-speak replies</span>
390
+ <input id="lessons-auto-speak" type="checkbox" checked />
 
 
391
  </label>
392
+ <button type="button" id="btn-lessons-clear" class="btn btn-ghost">Clear</button>
393
  </div>
394
+ <p id="lessons-turn-status" class="status-text"></p>
 
 
 
 
 
395
  </div>
396
+ </div>
397
+ <p class="lessons-classic-link status-text">
398
+ Pitch metrics and monologue analysis live in
399
+ <a href="/classic">Classic UI → EchoCoach</a>.
400
+ </p>
401
  </div>
402
  </div>
403
  </section>
apps/gradio-space/static/studio/studio.css CHANGED
@@ -387,11 +387,11 @@ body {
387
  .region-loading-host,
388
  .card-ingest,
389
  .card-chat,
390
- .voice-main-card,
391
  .coach-panel-wrap,
392
  .coach-debug-card,
393
  .controls-panel,
394
- .voice-rail-controls {
395
  position: relative;
396
  }
397
 
@@ -1017,26 +1017,26 @@ body {
1017
  .research-layout { grid-template-columns: 1fr; }
1018
  }
1019
 
1020
- .workspace[data-view="voice"] .col-research,
1021
- .workspace[data-view="voice"] .col-slides { display: none; }
1022
 
1023
- .workspace[data-view="voice"] .col-debug { display: none; }
1024
 
1025
- .view-voice-only { display: none; }
1026
 
1027
- .workspace[data-view="voice"] {
1028
  grid-template-columns: minmax(0, 1fr);
1029
  max-width: 1280px;
1030
  gap: 1.25rem;
1031
  }
1032
 
1033
- .workspace[data-view="voice"] .col-studio {
1034
  grid-column: 1 / -1;
1035
  width: 100%;
1036
  min-width: 0;
1037
  }
1038
 
1039
- .workspace[data-view="voice"] .voice-layout {
1040
  display: grid;
1041
  grid-template-columns: minmax(260px, 0.78fr) minmax(0, 1.22fr);
1042
  gap: 1.25rem;
@@ -1044,29 +1044,29 @@ body {
1044
  width: 100%;
1045
  }
1046
 
1047
- .workspace[data-view="voice"] .voice-rail {
1048
  display: flex;
1049
  flex-direction: column;
1050
  gap: 1rem;
1051
  min-width: 0;
1052
  }
1053
 
1054
- .workspace[data-view="voice"] .voice-main {
1055
  min-width: 0;
1056
  display: flex;
1057
  flex-direction: column;
1058
  gap: 1rem;
1059
  }
1060
 
1061
- .workspace[data-view="voice"] .voice-pitch-analysis {
1062
  margin: 0;
1063
  }
1064
 
1065
- .workspace[data-view="voice"] .voice-pitch-analysis[open] .voice-pitch-summary {
1066
  margin-bottom: 0.75rem;
1067
  }
1068
 
1069
- .workspace[data-view="voice"] .voice-pitch-summary {
1070
  cursor: pointer;
1071
  list-style: none;
1072
  display: flex;
@@ -1074,63 +1074,63 @@ body {
1074
  gap: 0.2rem;
1075
  }
1076
 
1077
- .workspace[data-view="voice"] .voice-pitch-summary::-webkit-details-marker {
1078
  display: none;
1079
  }
1080
 
1081
- .workspace[data-view="voice"] .voice-pitch-summary-hint {
1082
  font-size: 0.84rem;
1083
  color: var(--secondary);
1084
  font-weight: 400;
1085
  }
1086
 
1087
- .workspace[data-view="voice"] .voice-pitch-analysis .coach-panel-wrap {
1088
  padding-top: 0.25rem;
1089
  }
1090
 
1091
- .workspace[data-view="voice"] .voice-discuss-btn {
1092
  margin-top: 0.75rem;
1093
  }
1094
 
1095
- .workspace[data-view="voice"] .coach-results-panel {
1096
  min-height: 80px;
1097
  margin-top: 0.75rem;
1098
  overflow-y: auto;
1099
  }
1100
 
1101
- .workspace[data-view="voice"] .coach-results-panel:not(:empty) {
1102
  border-top: 1px solid var(--outline-variant);
1103
  padding-top: 0.75rem;
1104
  }
1105
 
1106
- .workspace[data-view="voice"] .voice-main-card {
1107
  display: flex;
1108
  flex-direction: column;
1109
  }
1110
 
1111
- .workspace[data-view="voice"] .voice-compose {
1112
  display: flex;
1113
  flex-direction: column;
1114
  gap: 0.5rem;
1115
  }
1116
 
1117
- .workspace[data-view="voice"] .voice-compose .field {
1118
  margin: 0;
1119
  }
1120
 
1121
- .workspace[data-view="voice"] .voice-compose textarea {
1122
  min-height: 3.25rem;
1123
  resize: vertical;
1124
  }
1125
 
1126
- .workspace[data-view="voice"] .voice-rail .voice-mode-cards {
1127
  flex-direction: row;
1128
  flex-wrap: wrap;
1129
  gap: 0.35rem;
1130
  margin-bottom: 0.75rem;
1131
  }
1132
 
1133
- .workspace[data-view="voice"] .voice-rail .voice-mode-cards .mode-card {
1134
  flex: 1 1 calc(33.333% - 0.35rem);
1135
  text-align: center;
1136
  justify-content: center;
@@ -1139,27 +1139,27 @@ body {
1139
  padding-right: 0.5rem;
1140
  }
1141
 
1142
- .workspace[data-view="voice"] .voice-rail-controls .voice-topic-wrap {
1143
  margin: 0 0 0.75rem;
1144
  }
1145
 
1146
- .workspace[data-view="voice"] .voice-rag-sources {
1147
  margin: 0;
1148
  }
1149
 
1150
- .workspace[data-view="voice"] .voice-rag-sources summary {
1151
  cursor: pointer;
1152
  font-weight: 600;
1153
  font-size: 0.82rem;
1154
  }
1155
 
1156
- .workspace[data-view="voice"] .voice-chat-messages {
1157
  min-height: 160px;
1158
  max-height: min(260px, 32vh);
1159
  margin: 0 0 0.75rem;
1160
  }
1161
 
1162
- .workspace[data-view="voice"] .voice-input-toolbar {
1163
  padding: 0.65rem 0.75rem;
1164
  border: 1px solid var(--outline-variant);
1165
  border-radius: var(--radius-lg);
@@ -1167,31 +1167,31 @@ body {
1167
  margin-bottom: 0.65rem;
1168
  }
1169
 
1170
- .workspace[data-view="voice"] .voice-recording-row {
1171
  margin: 0;
1172
  }
1173
 
1174
- .workspace[data-view="voice"] .voice-record-status {
1175
  margin: 0.35rem 0 0;
1176
  min-height: 1.1rem;
1177
  }
1178
 
1179
- .workspace[data-view="voice"] .voice-send-row {
1180
  display: grid;
1181
  grid-template-columns: 1fr 1fr;
1182
  gap: 0.5rem;
1183
  margin-bottom: 0.35rem;
1184
  }
1185
 
1186
- .workspace[data-view="voice"] .voice-card-head {
1187
  margin-bottom: 0.85rem;
1188
  }
1189
 
1190
- .workspace[data-view="voice"] .voice-card-head .section-label {
1191
  margin-bottom: 0.35rem;
1192
  }
1193
 
1194
- .voice-card-desc {
1195
  margin: 0;
1196
  font-size: 0.84rem;
1197
  line-height: 1.45;
@@ -1199,24 +1199,24 @@ body {
1199
  }
1200
 
1201
  @media (max-width: 960px) {
1202
- .workspace[data-view="voice"] .voice-layout {
1203
  grid-template-columns: 1fr;
1204
  max-width: 640px;
1205
  margin-left: auto;
1206
  margin-right: auto;
1207
  }
1208
 
1209
- .workspace[data-view="voice"] .voice-rail .voice-mode-cards {
1210
  flex-direction: column;
1211
  }
1212
 
1213
- .workspace[data-view="voice"] .voice-rail .voice-mode-cards .mode-card {
1214
  flex: 1 1 auto;
1215
  text-align: left;
1216
  justify-content: space-between;
1217
  }
1218
 
1219
- .workspace[data-view="voice"] .voice-send-row {
1220
  grid-template-columns: 1fr;
1221
  }
1222
  }
@@ -1421,22 +1421,22 @@ body {
1421
  max-width: 160px;
1422
  }
1423
 
1424
- .voice-audio-out audio,
1425
  .studio-coach-voiceout audio {
1426
  width: 100%;
1427
  margin-top: 0.5rem;
1428
  }
1429
 
1430
- .voice-chat-messages {
1431
  max-height: 220px;
1432
  margin: 0.75rem 0;
1433
  }
1434
 
1435
- .voice-rag-sources {
1436
  margin: 0.75rem 0;
1437
  }
1438
 
1439
- .voice-rag-sources summary {
1440
  cursor: pointer;
1441
  font-weight: 600;
1442
  font-size: 0.875rem;
@@ -1450,14 +1450,14 @@ body {
1450
  color: var(--on-surface-variant);
1451
  }
1452
 
1453
- .voice-replay-row {
1454
  display: flex;
1455
  flex-wrap: wrap;
1456
  gap: 0.5rem;
1457
  margin-top: 0.5rem;
1458
  }
1459
 
1460
- .voice-replay-row .btn-ghost {
1461
  margin-left: auto;
1462
  }
1463
 
@@ -1578,3 +1578,72 @@ body {
1578
  max-height: 320px;
1579
  }
1580
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
387
  .region-loading-host,
388
  .card-ingest,
389
  .card-chat,
390
+ .lessons-main-card,
391
  .coach-panel-wrap,
392
  .coach-debug-card,
393
  .controls-panel,
394
+ .lessons-rail-controls {
395
  position: relative;
396
  }
397
 
 
1017
  .research-layout { grid-template-columns: 1fr; }
1018
  }
1019
 
1020
+ .workspace[data-view="language-lessons"] .col-research,
1021
+ .workspace[data-view="language-lessons"] .col-slides { display: none; }
1022
 
1023
+ .workspace[data-view="language-lessons"] .col-debug { display: none; }
1024
 
1025
+ .view-lessons-only { display: none; }
1026
 
1027
+ .workspace[data-view="language-lessons"] {
1028
  grid-template-columns: minmax(0, 1fr);
1029
  max-width: 1280px;
1030
  gap: 1.25rem;
1031
  }
1032
 
1033
+ .workspace[data-view="language-lessons"] .col-studio {
1034
  grid-column: 1 / -1;
1035
  width: 100%;
1036
  min-width: 0;
1037
  }
1038
 
1039
+ .workspace[data-view="language-lessons"] .lessons-layout {
1040
  display: grid;
1041
  grid-template-columns: minmax(260px, 0.78fr) minmax(0, 1.22fr);
1042
  gap: 1.25rem;
 
1044
  width: 100%;
1045
  }
1046
 
1047
+ .workspace[data-view="language-lessons"] .lessons-rail {
1048
  display: flex;
1049
  flex-direction: column;
1050
  gap: 1rem;
1051
  min-width: 0;
1052
  }
1053
 
1054
+ .workspace[data-view="language-lessons"] .lessons-main {
1055
  min-width: 0;
1056
  display: flex;
1057
  flex-direction: column;
1058
  gap: 1rem;
1059
  }
1060
 
1061
+ .workspace[data-view="language-lessons"] .lessons-pitch-analysis {
1062
  margin: 0;
1063
  }
1064
 
1065
+ .workspace[data-view="language-lessons"] .lessons-pitch-analysis[open] .lessons-pitch-summary {
1066
  margin-bottom: 0.75rem;
1067
  }
1068
 
1069
+ .workspace[data-view="language-lessons"] .lessons-pitch-summary {
1070
  cursor: pointer;
1071
  list-style: none;
1072
  display: flex;
 
1074
  gap: 0.2rem;
1075
  }
1076
 
1077
+ .workspace[data-view="language-lessons"] .lessons-pitch-summary::-webkit-details-marker {
1078
  display: none;
1079
  }
1080
 
1081
+ .workspace[data-view="language-lessons"] .lessons-pitch-summary-hint {
1082
  font-size: 0.84rem;
1083
  color: var(--secondary);
1084
  font-weight: 400;
1085
  }
1086
 
1087
+ .workspace[data-view="language-lessons"] .lessons-pitch-analysis .coach-panel-wrap {
1088
  padding-top: 0.25rem;
1089
  }
1090
 
1091
+ .workspace[data-view="language-lessons"] .lessons-discuss-btn {
1092
  margin-top: 0.75rem;
1093
  }
1094
 
1095
+ .workspace[data-view="language-lessons"] .coach-results-panel {
1096
  min-height: 80px;
1097
  margin-top: 0.75rem;
1098
  overflow-y: auto;
1099
  }
1100
 
1101
+ .workspace[data-view="language-lessons"] .coach-results-panel:not(:empty) {
1102
  border-top: 1px solid var(--outline-variant);
1103
  padding-top: 0.75rem;
1104
  }
1105
 
1106
+ .workspace[data-view="language-lessons"] .lessons-main-card {
1107
  display: flex;
1108
  flex-direction: column;
1109
  }
1110
 
1111
+ .workspace[data-view="language-lessons"] .lessons-compose {
1112
  display: flex;
1113
  flex-direction: column;
1114
  gap: 0.5rem;
1115
  }
1116
 
1117
+ .workspace[data-view="language-lessons"] .lessons-compose .field {
1118
  margin: 0;
1119
  }
1120
 
1121
+ .workspace[data-view="language-lessons"] .lessons-compose textarea {
1122
  min-height: 3.25rem;
1123
  resize: vertical;
1124
  }
1125
 
1126
+ .workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards {
1127
  flex-direction: row;
1128
  flex-wrap: wrap;
1129
  gap: 0.35rem;
1130
  margin-bottom: 0.75rem;
1131
  }
1132
 
1133
+ .workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards .mode-card {
1134
  flex: 1 1 calc(33.333% - 0.35rem);
1135
  text-align: center;
1136
  justify-content: center;
 
1139
  padding-right: 0.5rem;
1140
  }
1141
 
1142
+ .workspace[data-view="language-lessons"] .lessons-rail-controls .lessons-topic-wrap {
1143
  margin: 0 0 0.75rem;
1144
  }
1145
 
1146
+ .workspace[data-view="language-lessons"] .lessons-rag-sources {
1147
  margin: 0;
1148
  }
1149
 
1150
+ .workspace[data-view="language-lessons"] .lessons-rag-sources summary {
1151
  cursor: pointer;
1152
  font-weight: 600;
1153
  font-size: 0.82rem;
1154
  }
1155
 
1156
+ .workspace[data-view="language-lessons"] .lessons-chat-messages {
1157
  min-height: 160px;
1158
  max-height: min(260px, 32vh);
1159
  margin: 0 0 0.75rem;
1160
  }
1161
 
1162
+ .workspace[data-view="language-lessons"] .lessons-input-toolbar {
1163
  padding: 0.65rem 0.75rem;
1164
  border: 1px solid var(--outline-variant);
1165
  border-radius: var(--radius-lg);
 
1167
  margin-bottom: 0.65rem;
1168
  }
1169
 
1170
+ .workspace[data-view="language-lessons"] .lessons-recording-row {
1171
  margin: 0;
1172
  }
1173
 
1174
+ .workspace[data-view="language-lessons"] .lessons-record-status {
1175
  margin: 0.35rem 0 0;
1176
  min-height: 1.1rem;
1177
  }
1178
 
1179
+ .workspace[data-view="language-lessons"] .lessons-send-row {
1180
  display: grid;
1181
  grid-template-columns: 1fr 1fr;
1182
  gap: 0.5rem;
1183
  margin-bottom: 0.35rem;
1184
  }
1185
 
1186
+ .workspace[data-view="language-lessons"] .lessons-card-head {
1187
  margin-bottom: 0.85rem;
1188
  }
1189
 
1190
+ .workspace[data-view="language-lessons"] .lessons-card-head .section-label {
1191
  margin-bottom: 0.35rem;
1192
  }
1193
 
1194
+ .lessons-card-desc {
1195
  margin: 0;
1196
  font-size: 0.84rem;
1197
  line-height: 1.45;
 
1199
  }
1200
 
1201
  @media (max-width: 960px) {
1202
+ .workspace[data-view="language-lessons"] .lessons-layout {
1203
  grid-template-columns: 1fr;
1204
  max-width: 640px;
1205
  margin-left: auto;
1206
  margin-right: auto;
1207
  }
1208
 
1209
+ .workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards {
1210
  flex-direction: column;
1211
  }
1212
 
1213
+ .workspace[data-view="language-lessons"] .lessons-rail .lessons-mode-cards .mode-card {
1214
  flex: 1 1 auto;
1215
  text-align: left;
1216
  justify-content: space-between;
1217
  }
1218
 
1219
+ .workspace[data-view="language-lessons"] .lessons-send-row {
1220
  grid-template-columns: 1fr;
1221
  }
1222
  }
 
1421
  max-width: 160px;
1422
  }
1423
 
1424
+ .lessons-audio-out audio,
1425
  .studio-coach-voiceout audio {
1426
  width: 100%;
1427
  margin-top: 0.5rem;
1428
  }
1429
 
1430
+ .lessons-chat-messages {
1431
  max-height: 220px;
1432
  margin: 0.75rem 0;
1433
  }
1434
 
1435
+ .lessons-rag-sources {
1436
  margin: 0.75rem 0;
1437
  }
1438
 
1439
+ .lessons-rag-sources summary {
1440
  cursor: pointer;
1441
  font-weight: 600;
1442
  font-size: 0.875rem;
 
1450
  color: var(--on-surface-variant);
1451
  }
1452
 
1453
+ .lessons-replay-row {
1454
  display: flex;
1455
  flex-wrap: wrap;
1456
  gap: 0.5rem;
1457
  margin-top: 0.5rem;
1458
  }
1459
 
1460
+ .lessons-replay-row .btn-ghost {
1461
  margin-left: auto;
1462
  }
1463
 
 
1578
  max-height: 320px;
1579
  }
1580
 
1581
+ .lessons-rail-card .field + .field {
1582
+ margin-top: 0.65rem;
1583
+ }
1584
+
1585
+ .lessons-input-toolbar {
1586
+ display: flex;
1587
+ flex-wrap: wrap;
1588
+ gap: 0.5rem;
1589
+ align-items: center;
1590
+ margin-top: 0.5rem;
1591
+ }
1592
+
1593
+ .lessons-hold-mic.is-recording {
1594
+ background: var(--primary-container);
1595
+ color: var(--on-primary-container);
1596
+ }
1597
+
1598
+ .lessons-upload-btn {
1599
+ cursor: pointer;
1600
+ display: inline-flex;
1601
+ align-items: center;
1602
+ gap: 0.35rem;
1603
+ }
1604
+
1605
+ .lessons-auto-speak {
1606
+ margin: 0;
1607
+ flex: 1 1 auto;
1608
+ justify-content: flex-end;
1609
+ }
1610
+
1611
+ .lessons-send-row {
1612
+ display: flex;
1613
+ flex-wrap: wrap;
1614
+ gap: 0.65rem;
1615
+ align-items: center;
1616
+ margin-top: 0.65rem;
1617
+ }
1618
+
1619
+ .lessons-chat-messages .chat-audio-inline {
1620
+ margin-top: 0.5rem;
1621
+ width: 100%;
1622
+ }
1623
+
1624
+ .lessons-classic-link {
1625
+ margin-top: 0.75rem;
1626
+ text-align: center;
1627
+ }
1628
+
1629
+ .lessons-classic-link a {
1630
+ color: var(--primary);
1631
+ }
1632
+
1633
+ .lessons-message-user::before {
1634
+ content: "You · ";
1635
+ font-weight: 600;
1636
+ opacity: 0.75;
1637
+ }
1638
+
1639
+ .lessons-message-assistant::before {
1640
+ content: "Teacher · ";
1641
+ font-weight: 600;
1642
+ opacity: 0.75;
1643
+ }
1644
+
1645
+ .btn-compact {
1646
+ padding-inline: 0.65rem;
1647
+ font-size: 0.82rem;
1648
+ }
1649
+
apps/gradio-space/static/studio/studio.js CHANGED
@@ -40,11 +40,11 @@ const state = {
40
  selectedUrls: [],
41
  slideDiscoveredUrls: [],
42
  slideSelectedUrls: [],
43
- voiceDiscoveredUrls: [],
44
- voiceSelectedUrls: [],
45
  researchChatHistory: [],
46
  debugChatHistory: [],
47
- voiceMode: "lesson",
48
  history: [],
49
  downloads: null,
50
  client: null,
@@ -55,9 +55,8 @@ const state = {
55
  recordingTarget: null,
56
  browserRecorder: null,
57
  browserRecordChunks: [],
58
- pendingVoiceAudioPath: null,
59
- pendingCoachAudioPath: null,
60
- lastPitchAnalysis: null,
61
  useBrowserMic: true,
62
  };
63
 
@@ -223,16 +222,37 @@ function renderResearchUrlChoices(urls, selected) {
223
  if (getIngestWorkflow() === "select") panel?.classList.remove("hidden");
224
  }
225
 
226
- function voiceEffectiveTopic() {
227
- if (state.voiceMode === "pitch") return effectiveTopic("");
228
- return effectiveTopic($("#voice-topic")?.value || "");
229
  }
230
 
231
- function voiceUseRag() {
232
- return $("#use-rag").checked && state.voiceMode !== "pitch";
233
  }
234
 
235
- function voiceMessageText(content) {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
236
  if (content == null) return "";
237
  if (typeof content === "string") return content;
238
  if (Array.isArray(content)) {
@@ -253,8 +273,14 @@ function ingestSucceeded(status) {
253
  );
254
  }
255
 
256
- function applyVoiceIngestResult(data) {
257
- $("#voice-ingest-status").textContent = stripMd(data.status || "Ingest complete.");
 
 
 
 
 
 
258
  state.workspaceSessionId = data.session_id || state.workspaceSessionId;
259
  $("#workspace-session").value = state.workspaceSessionId;
260
  if (data.documents_html) {
@@ -264,20 +290,21 @@ function applyVoiceIngestResult(data) {
264
  updateResearchRagBadge();
265
  updateResearchDocCount((data.documents || []).length);
266
  if (ingestSucceeded(data.status)) {
267
- $("#use-rag").checked = true;
 
268
  }
269
  }
270
 
271
- async function discoverVoiceSources() {
272
- const topic = voiceEffectiveTopic();
273
  if (!topic) {
274
- showError("Set a focus or workspace topic before discovering sources.");
275
  return;
276
  }
277
- await withRegionLoading($(".voice-rail-controls"), "Discovering sources…", async () => {
278
  const data = await callApi("discover_sources", [topic, state.workspaceSessionId]);
279
- $("#voice-ingest-status").textContent = stripMd(data.status || "Discovery complete.");
280
- renderVoiceUrlChoices(data.urls || [], data.selected_urls || data.urls || []);
281
  if (data.session_id) {
282
  state.workspaceSessionId = data.session_id;
283
  $("#workspace-session").value = data.session_id;
@@ -286,32 +313,32 @@ async function discoverVoiceSources() {
286
  });
287
  }
288
 
289
- async function autoVoiceIngest() {
290
- const topic = voiceEffectiveTopic();
291
  if (!topic) {
292
- showError("Set a focus or workspace topic before auto-ingest.");
293
  return;
294
  }
295
- await withRegionLoading($(".voice-rail-controls"), "Auto-ingesting sources…", async () => {
296
  const data = await callApi("auto_search_ingest", [topic, state.workspaceSessionId]);
297
- applyVoiceIngestResult(data);
298
- state.voiceDiscoveredUrls = [];
299
- state.voiceSelectedUrls = [];
300
- renderVoiceUrlChoices([], []);
301
  await refreshWorkspaceSessions(state.workspaceSessionId);
302
  });
303
  }
304
 
305
- async function ingestVoiceSources() {
306
- const topic = voiceEffectiveTopic();
307
- const pasted = $("#voice-urls-text")?.value.trim() || "";
308
- const selected = getSelectedDiscoveredUrls("#voice-url-choices-list");
309
- const files = $("#voice-ingest-file")?.files;
310
  if (!pasted && !selected.length && !files?.length) {
311
  showError("Add URLs, select suggested sources, or upload a file — then ingest.");
312
  return;
313
  }
314
- await withRegionLoading($(".voice-rail-controls"), "Ingesting sources…", async () => {
315
  const paths = [];
316
  if (files?.length) {
317
  for (const file of files) {
@@ -325,35 +352,40 @@ async function ingestVoiceSources() {
325
  selected,
326
  paths,
327
  ]);
328
- applyVoiceIngestResult(data);
329
- if (pasted) $("#voice-urls-text").value = "";
330
- if (files?.length) $("#voice-ingest-file").value = "";
331
  await refreshWorkspaceSessions(state.workspaceSessionId);
332
  });
333
  }
334
 
335
- function syncVoiceModeUi() {
336
- const ragMode = state.voiceMode === "explain" || state.voiceMode === "lesson";
337
- const practiceMode = state.voiceMode === "pitch";
338
- $("#voice-topic-wrap")?.classList.toggle("hidden", !ragMode);
339
- $("#voice-rag-sources")?.classList.toggle("hidden", !ragMode);
340
- $(".voice-rag-card")?.classList.toggle("hidden", practiceMode);
341
- $("#voice-pitch-analysis")?.classList.toggle("hidden", !practiceMode);
342
  const placeholders = {
343
  explain: "e.g. How does finetuning differ from pretraining?",
344
  lesson: "What is the difference between pretraining and finetuning a small model?",
345
- pitch: "e.g. Here is my opening line — how can I improve it?",
346
  };
347
- const messageEl = $("#voice-message");
348
- if (messageEl) messageEl.placeholder = placeholders[state.voiceMode] || placeholders.lesson;
349
  }
350
 
351
- function renderVoiceChat() {
352
- const container = $("#voice-chat-messages");
 
 
 
 
 
 
 
 
 
 
 
 
353
  if (!container) return;
354
  if (!state.history.length) {
355
  container.innerHTML =
356
- '<p class="research-chat-empty">Type a message or record audio, then send.</p>';
357
  return;
358
  }
359
  const parts = [];
@@ -361,9 +393,13 @@ function renderVoiceChat() {
361
  if (item && typeof item === "object" && item.role) {
362
  const role = item.role === "user" ? "user" : "assistant";
363
  const label = role === "user" ? "You" : "Teacher";
364
- let body = renderMarkdownLite(voiceMessageText(item.content));
 
 
 
 
365
  if (role === "assistant" && item.rag_references) {
366
- body += `<div class="voice-rag-refs">${renderMarkdownLite(item.rag_references)}</div>`;
367
  }
368
  parts.push(
369
  `<div class="research-chat-bubble research-chat-${role}"><div class="research-chat-role">${label}</div><div class="research-chat-body">${body}</div></div>`
@@ -380,18 +416,50 @@ function renderVoiceChat() {
380
  container.scrollTop = container.scrollHeight;
381
  }
382
 
383
- function renderVoiceUrlChoices(urls, selected) {
384
- state.voiceDiscoveredUrls = urls || [];
385
- state.voiceSelectedUrls = selected?.length ? selected : [...state.voiceDiscoveredUrls];
386
  renderUrlChoices(
387
  urls,
388
  selected,
389
- "#voice-url-choices-list",
390
- "#voice-url-choices-panel",
391
- { discovered: state.voiceDiscoveredUrls, selected: state.voiceSelectedUrls }
392
  );
393
  }
394
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
395
  function renderSlideUrlChoices(urls, selected) {
396
  state.slideDiscoveredUrls = urls || [];
397
  state.slideSelectedUrls = selected?.length ? selected : [...state.slideDiscoveredUrls];
@@ -961,23 +1029,30 @@ async function refreshDocuments() {
961
  }
962
  }
963
 
964
- async function initVoicePresets() {
965
  const data = await callApi("voice_presets", []);
966
  state.voicePresets = data;
967
- const langSelect = $("#coach-language");
968
- const asrSelect = $("#coach-asr");
969
  if (langSelect) {
970
- langSelect.innerHTML = (data.languages || [])
971
  .map((o) => `<option value="${o.value}">${o.label}</option>`)
972
  .join("");
 
973
  langSelect.value = data.default_language || "en";
974
  }
975
- if (asrSelect) {
976
- asrSelect.innerHTML = (data.asr_presets || [])
977
- .map((o) => `<option value="${o.value}">${o.label}</option>`)
978
- .join("");
979
- asrSelect.value = data.default_asr || "";
 
 
980
  }
 
 
 
 
 
981
  }
982
 
983
  async function initSettings() {
@@ -1033,10 +1108,10 @@ async function initWorkspace() {
1033
  updateResearchRagBadge();
1034
  await refreshWorkspaceSessions();
1035
  await refreshDocuments();
1036
- await initVoicePresets();
1037
  await initSettings();
1038
- syncVoiceModeUi();
1039
- renderVoiceChat();
1040
  await refreshDebugDocuments();
1041
  const recStatus = await callApi("recording_status", []);
1042
  state.useBrowserMic = !recStatus.backend || /unavailable|no capture/i.test(recStatus.message || "");
@@ -1055,7 +1130,7 @@ async function generateSlides() {
1055
  const topic = effectiveTopic($("#lesson-topic").value);
1056
  const grade = $("#lesson-grade").value;
1057
  const slideCount = Number($("#slide-count").value);
1058
- const useRag = $("#use-rag").checked;
1059
  const docIds = effectiveDocIds([]);
1060
  const sourceMode = $("#slide-source-mode")?.value || "";
1061
  const searchWorkflow = $("#slide-search-workflow")?.value || "two_step";
@@ -1155,63 +1230,41 @@ async function generateSlides() {
1155
  );
1156
  }
1157
 
1158
- function renderVoiceReply(data, { keepAudio = false } = {}) {
1159
  state.history = data.history ?? state.history;
1160
- if (data.rag_references && state.history.length) {
1161
  const last = state.history[state.history.length - 1];
1162
  if (last && typeof last === "object" && last.role === "assistant") {
1163
- last.rag_references = data.rag_references;
 
1164
  }
1165
  }
1166
- renderVoiceChat();
1167
  if (data.status) {
1168
- $("#voice-turn-status").textContent = stripMd(data.status);
1169
- }
1170
- const out = $("#voice-audio-out");
1171
- if (data.voiceout_path) {
1172
- out.innerHTML = `<audio controls src="${fileUrl(data.voiceout_path)}"></audio>`;
1173
- } else if (!keepAudio) {
1174
- out.innerHTML = "";
1175
  }
1176
  }
1177
 
1178
- async function sendVoiceTurn() {
1179
- const message = $("#voice-message").value.trim();
1180
- if (!message) {
1181
- showError("Enter a message first.");
1182
- return;
1183
- }
1184
- const topic = voiceEffectiveTopic();
1185
- const useRag = voiceUseRag();
1186
- const docIds = effectiveDocIds([]);
1187
- const language = state.voicePresets?.default_language || "en";
1188
- await withRegionLoading($(".voice-main-card"), "Teacher is thinking…", async () => {
1189
- const data = await callApi("teacher_voice_turn", [
1190
- message,
1191
- state.voiceMode,
1192
- topic,
1193
- state.workspaceSessionId,
1194
- useRag,
1195
- state.history,
1196
- docIds,
1197
- language,
1198
- null,
1199
- ]);
1200
- $("#voice-message").value = "";
1201
- renderVoiceReply(data);
1202
- });
1203
  }
1204
 
1205
- async function sendVoiceAudioTurn(audioPath) {
1206
- const topic = voiceEffectiveTopic();
1207
- const useRag = voiceUseRag();
1208
  const docIds = effectiveDocIds([]);
1209
- const language = state.voicePresets?.default_language || "en";
1210
  const asr = state.voicePresets?.default_asr || null;
1211
- await withRegionLoading($(".voice-main-card"), "Processing voice…", async () => {
1212
- const data = await callApi("teacher_voice_audio_turn", [
1213
- audioPath,
1214
- state.voiceMode,
 
 
 
 
 
1215
  topic,
1216
  state.workspaceSessionId,
1217
  useRag,
@@ -1219,81 +1272,100 @@ async function sendVoiceAudioTurn(audioPath) {
1219
  docIds,
1220
  language,
1221
  asr,
 
 
 
1222
  ]);
1223
- if (data.user_text) $("#voice-message").value = data.user_text;
1224
- renderVoiceReply(data);
 
 
 
 
1225
  });
1226
  }
1227
 
1228
- async function speakVoiceReply(firstSentenceOnly) {
1229
- const language = state.voicePresets?.default_language || "en";
1230
- const data = await callApi("teacher_voice_speak", [state.history, language, firstSentenceOnly]);
1231
- $("#voice-turn-status").textContent = stripMd(data.status || "VoiceOut ready.");
1232
- if (data.voiceout_path) {
1233
- $("#voice-audio-out").innerHTML = `<audio controls src="${fileUrl(data.voiceout_path)}"></audio>`;
 
 
 
1234
  }
 
 
 
 
 
 
 
1235
  }
1236
 
1237
- async function clearVoiceConversation() {
 
 
 
 
 
 
 
 
1238
  const data = await callApi("teacher_voice_clear", []);
1239
  state.history = [];
1240
- renderVoiceChat();
1241
- $("#voice-message").value = "";
1242
- $("#voice-turn-status").textContent = stripMd(data.status || "Conversation cleared.");
1243
- $("#voice-audio-out").innerHTML = "";
1244
- }
1245
-
1246
- async function loadSamplePitch() {
1247
- const data = await callApi("load_sample_pitch", []);
1248
- state.pendingCoachAudioPath = data.audio_path;
1249
- $("#coach-record-status").textContent = stripMd(data.status || "Sample clip loaded.");
1250
- }
1251
-
1252
- async function analyzePitchWithPath(audioPath) {
1253
- const language = $("#coach-language")?.value || "en";
1254
- const asr = $("#coach-asr")?.value || null;
1255
- const speakRewrite = $("#coach-speak-rewrite")?.checked || false;
1256
- await withRegionLoading($("#voice-pitch-analysis"), "Analyzing pitch…", async () => {
1257
- const data = await callApi("analyze_pitch", [audioPath, language, asr, speakRewrite]);
1258
- state.lastPitchAnalysis = data;
1259
- const panel = $("#coach-panel");
1260
- panel.innerHTML = data.coach_panel_html || "";
1261
- const discussBtn = document.createElement("button");
1262
- discussBtn.type = "button";
1263
- discussBtn.className = "btn btn-secondary voice-discuss-btn";
1264
- discussBtn.textContent = "Discuss in chat";
1265
- discussBtn.addEventListener("click", () => discussPitchInChat().catch(() => {}));
1266
- if (data.transcript_html || data.report_md || data.tip) {
1267
- panel.appendChild(discussBtn);
1268
- }
1269
- });
1270
  }
1271
 
1272
- function discussPitchInChat() {
1273
- const data = state.lastPitchAnalysis;
1274
- if (!data) return;
1275
- const parts = [];
1276
- if (data.tip) parts.push(`Coach tip: ${stripMd(data.tip)}`);
1277
- if (data.report_md) parts.push(stripMd(data.report_md).slice(0, 800));
1278
- const prompt =
1279
- parts.length > 0
1280
- ? `Here is my pitch analysis. Help me improve based on this feedback:\n\n${parts.join("\n\n")}`
1281
- : "I just ran pitch analysis — what should I work on next?";
1282
- $("#voice-message").value = prompt;
1283
- $("#voice-message").focus();
1284
- const chat = $("#voice-chat-messages");
1285
- if (chat) chat.scrollIntoView({ behavior: "smooth", block: "nearest" });
1286
- }
1287
-
1288
- async function analyzePitch() {
1289
- let path = state.pendingCoachAudioPath;
1290
- const file = $("#coach-audio").files?.[0];
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1291
  if (file) path = await uploadFile(file);
1292
  if (!path) {
1293
- showError("Record or upload audio to analyze.");
1294
  return;
1295
  }
1296
- await analyzePitchWithPath(path);
 
 
 
 
 
1297
  }
1298
 
1299
  async function startBrowserRecording(statusEl) {
@@ -1367,23 +1439,11 @@ async function stopRecording(statusEl, startBtn, stopBtn) {
1367
  path = data.path;
1368
  if (statusEl) statusEl.textContent = stripMd(data.status || "Recording saved.");
1369
  }
1370
- if (state.recordingTarget === "voice") state.pendingVoiceAudioPath = path;
1371
- if (state.recordingTarget === "coach") state.pendingCoachAudioPath = path;
1372
  state.recordingTarget = null;
1373
  return path;
1374
  }
1375
 
1376
- async function sendVoiceFromRecording() {
1377
- let path = state.pendingVoiceAudioPath;
1378
- const file = $("#voice-audio-upload").files?.[0];
1379
- if (file) path = await uploadFile(file);
1380
- if (!path) {
1381
- showError("Record or upload audio first.");
1382
- return;
1383
- }
1384
- await sendVoiceAudioTurn(path);
1385
- }
1386
-
1387
  function bindUi() {
1388
  $("#slide-count").addEventListener("input", (e) => {
1389
  $("#slide-count-val").textContent = e.target.value;
@@ -1450,18 +1510,52 @@ function bindUi() {
1450
  });
1451
 
1452
  $("#btn-generate").addEventListener("click", () => generateSlides().catch(() => {}));
1453
- $("#btn-voice-send").addEventListener("click", () => sendVoiceTurn().catch(() => {}));
1454
- $("#btn-voice-audio-send").addEventListener("click", () => sendVoiceFromRecording().catch(() => {}));
1455
- $("#btn-voice-discover")?.addEventListener("click", () => discoverVoiceSources().catch(() => {}));
1456
- $("#btn-voice-auto-ingest")?.addEventListener("click", () => autoVoiceIngest().catch(() => {}));
1457
- $("#btn-voice-ingest")?.addEventListener("click", () => ingestVoiceSources().catch(() => {}));
1458
- $("#voice-ingest-file")?.addEventListener("change", (e) => ingestVoiceSources().catch(() => {}));
1459
- $("#btn-voice-speak-full")?.addEventListener("click", () => speakVoiceReply(false).catch(() => {}));
1460
- $("#btn-voice-speak-quick")?.addEventListener("click", () => speakVoiceReply(true).catch(() => {}));
1461
- $("#btn-voice-clear")?.addEventListener("click", () => clearVoiceConversation().catch(() => {}));
1462
- $("#btn-coach-sample")?.addEventListener("click", () => loadSamplePitch().catch(() => {}));
1463
- $("#btn-analyze").addEventListener("click", () => analyzePitch().catch(() => {}));
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1464
  $("#btn-debug-send").addEventListener("click", () => sendDebugMessage().catch(() => {}));
 
1465
  $("#debug-session")?.addEventListener("change", () => refreshDebugDocuments().catch(() => {}));
1466
  $("#debug-refresh-sessions")?.addEventListener("click", () => {
1467
  refreshDebugSessions().catch(() => {});
@@ -1475,19 +1569,6 @@ function bindUi() {
1475
  }
1476
  });
1477
 
1478
- $("#btn-voice-record-start")?.addEventListener("click", () =>
1479
- startRecording("voice", $("#voice-record-status"), $("#btn-voice-record-start"), $("#btn-voice-record-stop")).catch(() => {})
1480
- );
1481
- $("#btn-voice-record-stop")?.addEventListener("click", () =>
1482
- stopRecording($("#voice-record-status"), $("#btn-voice-record-start"), $("#btn-voice-record-stop")).catch(() => {})
1483
- );
1484
- $("#btn-coach-record-start")?.addEventListener("click", () =>
1485
- startRecording("coach", $("#coach-record-status"), $("#btn-coach-record-start"), $("#btn-coach-record-stop")).catch(() => {})
1486
- );
1487
- $("#btn-coach-record-stop")?.addEventListener("click", () =>
1488
- stopRecording($("#coach-record-status"), $("#btn-coach-record-start"), $("#btn-coach-record-stop")).catch(() => {})
1489
- );
1490
-
1491
  $("#btn-export").addEventListener("click", () => {
1492
  const p = state.downloads?.pptx;
1493
  if (p) window.open(fileUrl(p), "_blank");
@@ -1506,16 +1587,16 @@ function bindUi() {
1506
  refreshDocuments().catch(() => {});
1507
  });
1508
 
1509
- document.querySelectorAll(".mode-card").forEach((btn) => {
1510
  btn.addEventListener("click", () => {
1511
- document.querySelectorAll(".mode-card").forEach((b) => b.classList.remove("active"));
1512
  btn.classList.add("active");
1513
- state.voiceMode = btn.dataset.mode;
1514
- syncVoiceModeUi();
1515
  });
1516
  });
1517
 
1518
- syncVoiceModeUi();
1519
  }
1520
 
1521
  bindUi();
 
40
  selectedUrls: [],
41
  slideDiscoveredUrls: [],
42
  slideSelectedUrls: [],
43
+ lessonsDiscoveredUrls: [],
44
+ lessonsSelectedUrls: [],
45
  researchChatHistory: [],
46
  debugChatHistory: [],
47
+ lessonsMode: "lesson",
48
  history: [],
49
  downloads: null,
50
  client: null,
 
55
  recordingTarget: null,
56
  browserRecorder: null,
57
  browserRecordChunks: [],
58
+ pendingLessonsAudioPath: null,
59
+ holdMicActive: false,
 
60
  useBrowserMic: true,
61
  };
62
 
 
222
  if (getIngestWorkflow() === "select") panel?.classList.remove("hidden");
223
  }
224
 
225
+ function lessonsEffectiveTopic() {
226
+ return effectiveTopic($("#lessons-topic")?.value || "");
 
227
  }
228
 
229
+ function lessonsUseRag() {
230
+ return Boolean($("#lessons-use-rag")?.checked);
231
  }
232
 
233
+ function lessonsLanguage() {
234
+ const select = $("#lessons-language");
235
+ if (!select) return "en";
236
+ if (select.value === "other") {
237
+ return ($("#lessons-other-lang")?.value.trim() || "en").toLowerCase();
238
+ }
239
+ return select.value || "en";
240
+ }
241
+
242
+ function lessonsCoachVariant() {
243
+ return $("#lessons-coach-variant")?.value || "tiny-aya-global";
244
+ }
245
+
246
+ function lessonsAutoSpeak() {
247
+ return Boolean($("#lessons-auto-speak")?.checked);
248
+ }
249
+
250
+ function lessonsHasVoiceOut(language) {
251
+ const code = (language || "en").split("-")[0];
252
+ return (state.voicePresets?.voice_languages || []).includes(code);
253
+ }
254
+
255
+ function chatMessageText(content) {
256
  if (content == null) return "";
257
  if (typeof content === "string") return content;
258
  if (Array.isArray(content)) {
 
273
  );
274
  }
275
 
276
+ function chatMessageAudio(content) {
277
+ if (!Array.isArray(content)) return null;
278
+ const filePart = content.find((part) => part && typeof part === "object" && part.path);
279
+ return filePart?.path || null;
280
+ }
281
+
282
+ function applyLessonsIngestResult(data) {
283
+ $("#lessons-ingest-status").textContent = stripMd(data.status || "Ingest complete.");
284
  state.workspaceSessionId = data.session_id || state.workspaceSessionId;
285
  $("#workspace-session").value = state.workspaceSessionId;
286
  if (data.documents_html) {
 
290
  updateResearchRagBadge();
291
  updateResearchDocCount((data.documents || []).length);
292
  if (ingestSucceeded(data.status)) {
293
+ const rag = $("#lessons-use-rag");
294
+ if (rag) rag.checked = true;
295
  }
296
  }
297
 
298
+ async function discoverLessonsSources() {
299
+ const topic = lessonsEffectiveTopic();
300
  if (!topic) {
301
+ showError("Set a lesson or workspace topic before discovering sources.");
302
  return;
303
  }
304
+ await withRegionLoading($(".lessons-rail-controls"), "Discovering sources…", async () => {
305
  const data = await callApi("discover_sources", [topic, state.workspaceSessionId]);
306
+ $("#lessons-ingest-status").textContent = stripMd(data.status || "Discovery complete.");
307
+ renderLessonsUrlChoices(data.urls || [], data.selected_urls || data.urls || []);
308
  if (data.session_id) {
309
  state.workspaceSessionId = data.session_id;
310
  $("#workspace-session").value = data.session_id;
 
313
  });
314
  }
315
 
316
+ async function autoLessonsIngest() {
317
+ const topic = lessonsEffectiveTopic();
318
  if (!topic) {
319
+ showError("Set a lesson or workspace topic before auto-ingest.");
320
  return;
321
  }
322
+ await withRegionLoading($(".lessons-rail-controls"), "Auto-ingesting sources…", async () => {
323
  const data = await callApi("auto_search_ingest", [topic, state.workspaceSessionId]);
324
+ applyLessonsIngestResult(data);
325
+ state.lessonsDiscoveredUrls = [];
326
+ state.lessonsSelectedUrls = [];
327
+ renderLessonsUrlChoices([], []);
328
  await refreshWorkspaceSessions(state.workspaceSessionId);
329
  });
330
  }
331
 
332
+ async function ingestLessonsSources() {
333
+ const topic = lessonsEffectiveTopic();
334
+ const pasted = $("#lessons-urls-text")?.value.trim() || "";
335
+ const selected = getSelectedDiscoveredUrls("#lessons-url-choices-list");
336
+ const files = $("#lessons-ingest-file")?.files;
337
  if (!pasted && !selected.length && !files?.length) {
338
  showError("Add URLs, select suggested sources, or upload a file — then ingest.");
339
  return;
340
  }
341
+ await withRegionLoading($(".lessons-rail-controls"), "Ingesting sources…", async () => {
342
  const paths = [];
343
  if (files?.length) {
344
  for (const file of files) {
 
352
  selected,
353
  paths,
354
  ]);
355
+ applyLessonsIngestResult(data);
356
+ if (pasted) $("#lessons-urls-text").value = "";
357
+ if (files?.length) $("#lessons-ingest-file").value = "";
358
  await refreshWorkspaceSessions(state.workspaceSessionId);
359
  });
360
  }
361
 
362
+ function syncLessonsModeUi() {
 
 
 
 
 
 
363
  const placeholders = {
364
  explain: "e.g. How does finetuning differ from pretraining?",
365
  lesson: "What is the difference between pretraining and finetuning a small model?",
 
366
  };
367
+ const messageEl = $("#lessons-message");
368
+ if (messageEl) messageEl.placeholder = placeholders[state.lessonsMode] || placeholders.lesson;
369
  }
370
 
371
+ function syncLessonsLanguageUi() {
372
+ const isOther = $("#lessons-language")?.value === "other";
373
+ $("#lessons-other-lang-wrap")?.classList.toggle("hidden", !isOther);
374
+ const lang = lessonsLanguage();
375
+ const note = state.voicePresets?.voiceout_note || "";
376
+ const voiceHint = lessonsHasVoiceOut(lang)
377
+ ? note
378
+ : "VoiceOut not available for this language — text replies only.";
379
+ const noteEl = $("#lessons-voiceout-note");
380
+ if (noteEl) noteEl.textContent = voiceHint;
381
+ }
382
+
383
+ function renderLessonsChat() {
384
+ const container = $("#lessons-chat-messages");
385
  if (!container) return;
386
  if (!state.history.length) {
387
  container.innerHTML =
388
+ '<p class="research-chat-empty">Choose a language, then type, speak, or upload audio to start your lesson.</p>';
389
  return;
390
  }
391
  const parts = [];
 
393
  if (item && typeof item === "object" && item.role) {
394
  const role = item.role === "user" ? "user" : "assistant";
395
  const label = role === "user" ? "You" : "Teacher";
396
+ let body = renderMarkdownLite(chatMessageText(item.content));
397
+ const audioPath = chatMessageAudio(item.content) || item.voiceout_path || null;
398
+ if (audioPath) {
399
+ body += `<audio class="chat-audio-inline" controls autoplay src="${fileUrl(audioPath)}"></audio>`;
400
+ }
401
  if (role === "assistant" && item.rag_references) {
402
+ body += `<div class="lessons-rag-refs">${renderMarkdownLite(item.rag_references)}</div>`;
403
  }
404
  parts.push(
405
  `<div class="research-chat-bubble research-chat-${role}"><div class="research-chat-role">${label}</div><div class="research-chat-body">${body}</div></div>`
 
416
  container.scrollTop = container.scrollHeight;
417
  }
418
 
419
+ function renderLessonsUrlChoices(urls, selected) {
420
+ state.lessonsDiscoveredUrls = urls || [];
421
+ state.lessonsSelectedUrls = selected?.length ? selected : [...state.lessonsDiscoveredUrls];
422
  renderUrlChoices(
423
  urls,
424
  selected,
425
+ "#lessons-url-choices-list",
426
+ "#lessons-url-choices-panel",
427
+ { discovered: state.lessonsDiscoveredUrls, selected: state.lessonsSelectedUrls }
428
  );
429
  }
430
 
431
+ function applyVoiceIngestResult(data) {
432
+ applyLessonsIngestResult(data);
433
+ }
434
+
435
+ async function discoverVoiceSources() {
436
+ return discoverLessonsSources();
437
+ }
438
+
439
+ async function autoVoiceIngest() {
440
+ return autoLessonsIngest();
441
+ }
442
+
443
+ async function ingestVoiceSources() {
444
+ return ingestLessonsSources();
445
+ }
446
+
447
+ function syncVoiceModeUi() {
448
+ syncLessonsModeUi();
449
+ }
450
+
451
+ function renderVoiceChat() {
452
+ renderLessonsChat();
453
+ }
454
+
455
+ function renderVoiceUrlChoices(urls, selected) {
456
+ renderLessonsUrlChoices(urls, selected);
457
+ }
458
+
459
+ function voiceMessageText(content) {
460
+ return chatMessageText(content);
461
+ }
462
+
463
  function renderSlideUrlChoices(urls, selected) {
464
  state.slideDiscoveredUrls = urls || [];
465
  state.slideSelectedUrls = selected?.length ? selected : [...state.slideDiscoveredUrls];
 
1029
  }
1030
  }
1031
 
1032
+ async function initLanguageLessons() {
1033
  const data = await callApi("voice_presets", []);
1034
  state.voicePresets = data;
1035
+ const langSelect = $("#lessons-language");
 
1036
  if (langSelect) {
1037
+ const opts = (data.languages || [])
1038
  .map((o) => `<option value="${o.value}">${o.label}</option>`)
1039
  .join("");
1040
+ langSelect.innerHTML = `${opts}<option value="other">Other (text only)</option>`;
1041
  langSelect.value = data.default_language || "en";
1042
  }
1043
+ const coachEl = document.querySelector(".lessons-coach-model");
1044
+ if (coachEl && data.coach_chain_labels?.length) {
1045
+ const primary = data.coach_chain_labels[0];
1046
+ const fallback = data.coach_chain_labels[1];
1047
+ coachEl.textContent = fallback
1048
+ ? `Coach: ${primary} (auto-fallback: ${fallback})`
1049
+ : `Coach: ${primary}`;
1050
  }
1051
+ syncLessonsLanguageUi();
1052
+ }
1053
+
1054
+ async function initVoicePresets() {
1055
+ return initLanguageLessons();
1056
  }
1057
 
1058
  async function initSettings() {
 
1108
  updateResearchRagBadge();
1109
  await refreshWorkspaceSessions();
1110
  await refreshDocuments();
1111
+ await initLanguageLessons();
1112
  await initSettings();
1113
+ syncLessonsModeUi();
1114
+ renderLessonsChat();
1115
  await refreshDebugDocuments();
1116
  const recStatus = await callApi("recording_status", []);
1117
  state.useBrowserMic = !recStatus.backend || /unavailable|no capture/i.test(recStatus.message || "");
 
1130
  const topic = effectiveTopic($("#lesson-topic").value);
1131
  const grade = $("#lesson-grade").value;
1132
  const slideCount = Number($("#slide-count").value);
1133
+ const useRag = Boolean($("#lessons-use-rag")?.checked);
1134
  const docIds = effectiveDocIds([]);
1135
  const sourceMode = $("#slide-source-mode")?.value || "";
1136
  const searchWorkflow = $("#slide-search-workflow")?.value || "two_step";
 
1230
  );
1231
  }
1232
 
1233
+ function renderLessonsReply(data) {
1234
  state.history = data.history ?? state.history;
1235
+ if (state.history.length) {
1236
  const last = state.history[state.history.length - 1];
1237
  if (last && typeof last === "object" && last.role === "assistant") {
1238
+ if (data.rag_references) last.rag_references = data.rag_references;
1239
+ if (data.voiceout_path && lessonsAutoSpeak()) last.voiceout_path = data.voiceout_path;
1240
  }
1241
  }
1242
+ renderLessonsChat();
1243
  if (data.status) {
1244
+ const statusEl = $("#lessons-turn-status");
1245
+ if (statusEl) statusEl.textContent = stripMd(data.status);
 
 
 
 
 
1246
  }
1247
  }
1248
 
1249
+ function renderVoiceReply(data, options) {
1250
+ renderLessonsReply(data, options);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1251
  }
1252
 
1253
+ async function sendLanguageLessonTurn({ message = "", audioPath = "" } = {}) {
1254
+ const topic = lessonsEffectiveTopic();
1255
+ const useRag = lessonsUseRag();
1256
  const docIds = effectiveDocIds([]);
1257
+ const language = lessonsLanguage();
1258
  const asr = state.voicePresets?.default_asr || null;
1259
+ const autoVoiceout = lessonsAutoSpeak() && lessonsHasVoiceOut(language);
1260
+ const coachVariant = lessonsCoachVariant();
1261
+ const loadingLabel = message || audioPath ? (message ? "Teacher is thinking…" : "Processing audio…") : "Sending…";
1262
+
1263
+ await withRegionLoading($(".lessons-main-card"), loadingLabel, async () => {
1264
+ const data = await callApi("language_lesson_turn", [
1265
+ message,
1266
+ audioPath || "",
1267
+ state.lessonsMode,
1268
  topic,
1269
  state.workspaceSessionId,
1270
  useRag,
 
1272
  docIds,
1273
  language,
1274
  asr,
1275
+ autoVoiceout,
1276
+ "",
1277
+ coachVariant,
1278
  ]);
1279
+ if (data.user_text) {
1280
+ $("#lessons-message").value = data.user_text;
1281
+ } else if (message) {
1282
+ $("#lessons-message").value = "";
1283
+ }
1284
+ renderLessonsReply(data);
1285
  });
1286
  }
1287
 
1288
+ async function sendLessonsTurn() {
1289
+ const message = $("#lessons-message")?.value.trim() || "";
1290
+ let audioPath = state.pendingLessonsAudioPath;
1291
+ const file = $("#lessons-audio-upload")?.files?.[0];
1292
+ if (file) audioPath = await uploadFile(file);
1293
+ if (message) {
1294
+ await sendLanguageLessonTurn({ message });
1295
+ state.pendingLessonsAudioPath = null;
1296
+ return;
1297
  }
1298
+ if (audioPath) {
1299
+ await sendLanguageLessonTurn({ audioPath });
1300
+ state.pendingLessonsAudioPath = null;
1301
+ if ($("#lessons-audio-upload")) $("#lessons-audio-upload").value = "";
1302
+ return;
1303
+ }
1304
+ showError("Type a message, hold the mic, or upload audio.");
1305
  }
1306
 
1307
+ async function sendVoiceTurn() {
1308
+ return sendLessonsTurn();
1309
+ }
1310
+
1311
+ async function sendVoiceAudioTurn(audioPath) {
1312
+ return sendLanguageLessonTurn({ audioPath });
1313
+ }
1314
+
1315
+ async function clearLessonsConversation() {
1316
  const data = await callApi("teacher_voice_clear", []);
1317
  state.history = [];
1318
+ renderLessonsChat();
1319
+ if ($("#lessons-message")) $("#lessons-message").value = "";
1320
+ const statusEl = $("#lessons-turn-status");
1321
+ if (statusEl) statusEl.textContent = stripMd(data.status || "Conversation cleared.");
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1322
  }
1323
 
1324
+ async function clearVoiceConversation() {
1325
+ return clearLessonsConversation();
1326
+ }
1327
+
1328
+ async function startLessonsHoldMic(e) {
1329
+ if (state.holdMicActive) return;
1330
+ state.holdMicActive = true;
1331
+ e?.preventDefault();
1332
+ const holdBtn = $("#btn-lessons-hold-mic");
1333
+ holdBtn?.classList.add("recording");
1334
+ await startRecording(
1335
+ "lessons",
1336
+ $("#lessons-record-status"),
1337
+ $("#btn-lessons-record-start"),
1338
+ $("#btn-lessons-record-stop")
1339
+ );
1340
+ }
1341
+
1342
+ async function stopLessonsHoldMic(e) {
1343
+ if (!state.holdMicActive) return;
1344
+ state.holdMicActive = false;
1345
+ e?.preventDefault();
1346
+ $("#btn-lessons-hold-mic")?.classList.remove("recording");
1347
+ const path = await stopRecording(
1348
+ $("#lessons-record-status"),
1349
+ $("#btn-lessons-record-start"),
1350
+ $("#btn-lessons-record-stop")
1351
+ );
1352
+ if (path) await sendLanguageLessonTurn({ audioPath: path });
1353
+ }
1354
+
1355
+ async function sendLessonsFromRecording() {
1356
+ let path = state.pendingLessonsAudioPath;
1357
+ const file = $("#lessons-audio-upload")?.files?.[0];
1358
  if (file) path = await uploadFile(file);
1359
  if (!path) {
1360
+ showError("Record or upload audio first.");
1361
  return;
1362
  }
1363
+ await sendLanguageLessonTurn({ audioPath: path });
1364
+ state.pendingLessonsAudioPath = null;
1365
+ }
1366
+
1367
+ async function sendVoiceFromRecording() {
1368
+ return sendLessonsFromRecording();
1369
  }
1370
 
1371
  async function startBrowserRecording(statusEl) {
 
1439
  path = data.path;
1440
  if (statusEl) statusEl.textContent = stripMd(data.status || "Recording saved.");
1441
  }
1442
+ if (state.recordingTarget === "lessons") state.pendingLessonsAudioPath = path;
 
1443
  state.recordingTarget = null;
1444
  return path;
1445
  }
1446
 
 
 
 
 
 
 
 
 
 
 
 
1447
  function bindUi() {
1448
  $("#slide-count").addEventListener("input", (e) => {
1449
  $("#slide-count-val").textContent = e.target.value;
 
1510
  });
1511
 
1512
  $("#btn-generate").addEventListener("click", () => generateSlides().catch(() => {}));
1513
+
1514
+ $("#btn-lessons-send")?.addEventListener("click", () => sendLessonsTurn().catch(() => {}));
1515
+ $("#lessons-message")?.addEventListener("keydown", (e) => {
1516
+ if (e.key === "Enter" && !e.shiftKey) {
1517
+ e.preventDefault();
1518
+ sendLessonsTurn().catch(() => {});
1519
+ }
1520
+ });
1521
+ $("#btn-lessons-discover")?.addEventListener("click", () => discoverLessonsSources().catch(() => {}));
1522
+ $("#btn-lessons-auto-ingest")?.addEventListener("click", () => autoLessonsIngest().catch(() => {}));
1523
+ $("#btn-lessons-ingest")?.addEventListener("click", () => ingestLessonsSources().catch(() => {}));
1524
+ $("#lessons-ingest-file")?.addEventListener("change", () => ingestLessonsSources().catch(() => {}));
1525
+ $("#btn-lessons-clear")?.addEventListener("click", () => clearLessonsConversation().catch(() => {}));
1526
+ $("#lessons-language")?.addEventListener("change", syncLessonsLanguageUi);
1527
+ $("#lessons-other-lang")?.addEventListener("input", syncLessonsLanguageUi);
1528
+ $("#lessons-audio-upload")?.addEventListener("change", () => sendLessonsTurn().catch(() => {}));
1529
+
1530
+ const holdMic = $("#btn-lessons-hold-mic");
1531
+ if (holdMic) {
1532
+ holdMic.addEventListener("mousedown", (e) => startLessonsHoldMic(e).catch(() => {}));
1533
+ holdMic.addEventListener("mouseup", (e) => stopLessonsHoldMic(e).catch(() => {}));
1534
+ holdMic.addEventListener("mouseleave", (e) => {
1535
+ if (state.holdMicActive) stopLessonsHoldMic(e).catch(() => {});
1536
+ });
1537
+ holdMic.addEventListener("touchstart", (e) => startLessonsHoldMic(e).catch(() => {}), { passive: false });
1538
+ holdMic.addEventListener("touchend", (e) => stopLessonsHoldMic(e).catch(() => {}));
1539
+ }
1540
+
1541
+ $("#btn-lessons-record-start")?.addEventListener("click", () =>
1542
+ startRecording(
1543
+ "lessons",
1544
+ $("#lessons-record-status"),
1545
+ $("#btn-lessons-record-start"),
1546
+ $("#btn-lessons-record-stop")
1547
+ ).catch(() => {})
1548
+ );
1549
+ $("#btn-lessons-record-stop")?.addEventListener("click", () =>
1550
+ stopRecording(
1551
+ $("#lessons-record-status"),
1552
+ $("#btn-lessons-record-start"),
1553
+ $("#btn-lessons-record-stop")
1554
+ ).catch(() => {})
1555
+ );
1556
+
1557
  $("#btn-debug-send").addEventListener("click", () => sendDebugMessage().catch(() => {}));
1558
+
1559
  $("#debug-session")?.addEventListener("change", () => refreshDebugDocuments().catch(() => {}));
1560
  $("#debug-refresh-sessions")?.addEventListener("click", () => {
1561
  refreshDebugSessions().catch(() => {});
 
1569
  }
1570
  });
1571
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1572
  $("#btn-export").addEventListener("click", () => {
1573
  const p = state.downloads?.pptx;
1574
  if (p) window.open(fileUrl(p), "_blank");
 
1587
  refreshDocuments().catch(() => {});
1588
  });
1589
 
1590
+ document.querySelectorAll("#lessons-modes .mode-card").forEach((btn) => {
1591
  btn.addEventListener("click", () => {
1592
+ document.querySelectorAll("#lessons-modes .mode-card").forEach((b) => b.classList.remove("active"));
1593
  btn.classList.add("active");
1594
+ state.lessonsMode = btn.dataset.mode;
1595
+ syncLessonsModeUi();
1596
  });
1597
  });
1598
 
1599
+ syncLessonsModeUi();
1600
  }
1601
 
1602
  bindUi();
libs/echocoach/src/echocoach/config.py CHANGED
@@ -45,12 +45,23 @@ class EchoCoachConfig:
45
  tts_preset: str
46
  realtime_tts_preset: str | None
47
  coach_model: str
 
48
  max_seconds: int
49
  languages: list[LanguageOption]
50
  asr_presets: dict[str, AsrPreset]
51
  tts_presets: dict[str, TtsPreset]
52
  presets_path: Path | None = None
53
 
 
 
 
 
 
 
 
 
 
 
54
  def get_asr(self, key: str | None = None) -> AsrPreset:
55
  preset_key = key or self.asr_preset
56
  if preset_key not in self.asr_presets:
@@ -114,6 +125,7 @@ def _builtin_config() -> EchoCoachConfig:
114
  tts_preset="piper-multilingual",
115
  realtime_tts_preset=None,
116
  coach_model="minicpm5-1b",
 
117
  max_seconds=30,
118
  languages=langs,
119
  asr_presets=asr,
@@ -201,11 +213,15 @@ def load_echo_coach_config() -> EchoCoachConfig:
201
  if tts_default not in tts_presets:
202
  tts_default = next(iter(tts_presets))
203
 
 
 
 
204
  config = EchoCoachConfig(
205
  asr_preset=asr_default,
206
  tts_preset=tts_default,
207
  realtime_tts_preset=defaults.get("realtime_tts_preset"),
208
  coach_model=str(defaults.get("coach_model", "minicpm5-1b")),
 
209
  max_seconds=int(defaults.get("max_seconds", 30)),
210
  languages=languages,
211
  asr_presets=asr_presets,
@@ -222,6 +238,12 @@ def load_echo_coach_config() -> EchoCoachConfig:
222
  updates["realtime_tts_preset"] = os.environ["ECHOCOACH_REALTIME_TTS_PRESET"]
223
  if os.environ.get("ECHOCOACH_COACH_MODEL"):
224
  updates["coach_model"] = os.environ["ECHOCOACH_COACH_MODEL"]
 
 
 
 
 
 
225
  if os.environ.get("ECHOCOACH_MAX_SECONDS"):
226
  updates["max_seconds"] = int(os.environ["ECHOCOACH_MAX_SECONDS"])
227
 
 
45
  tts_preset: str
46
  realtime_tts_preset: str | None
47
  coach_model: str
48
+ coach_fallbacks: tuple[str, ...]
49
  max_seconds: int
50
  languages: list[LanguageOption]
51
  asr_presets: dict[str, AsrPreset]
52
  tts_presets: dict[str, TtsPreset]
53
  presets_path: Path | None = None
54
 
55
+ def coach_model_chain(self) -> list[str]:
56
+ """Primary coach preset followed by fallbacks (deduped, order preserved)."""
57
+ chain: list[str] = []
58
+ seen: set[str] = set()
59
+ for key in (self.coach_model, *self.coach_fallbacks):
60
+ if key and key not in seen:
61
+ seen.add(key)
62
+ chain.append(key)
63
+ return chain
64
+
65
  def get_asr(self, key: str | None = None) -> AsrPreset:
66
  preset_key = key or self.asr_preset
67
  if preset_key not in self.asr_presets:
 
125
  tts_preset="piper-multilingual",
126
  realtime_tts_preset=None,
127
  coach_model="minicpm5-1b",
128
+ coach_fallbacks=(),
129
  max_seconds=30,
130
  languages=langs,
131
  asr_presets=asr,
 
213
  if tts_default not in tts_presets:
214
  tts_default = next(iter(tts_presets))
215
 
216
+ raw_fallbacks = defaults.get("coach_fallbacks") or []
217
+ coach_fallbacks = tuple(str(item) for item in raw_fallbacks)
218
+
219
  config = EchoCoachConfig(
220
  asr_preset=asr_default,
221
  tts_preset=tts_default,
222
  realtime_tts_preset=defaults.get("realtime_tts_preset"),
223
  coach_model=str(defaults.get("coach_model", "minicpm5-1b")),
224
+ coach_fallbacks=coach_fallbacks,
225
  max_seconds=int(defaults.get("max_seconds", 30)),
226
  languages=languages,
227
  asr_presets=asr_presets,
 
238
  updates["realtime_tts_preset"] = os.environ["ECHOCOACH_REALTIME_TTS_PRESET"]
239
  if os.environ.get("ECHOCOACH_COACH_MODEL"):
240
  updates["coach_model"] = os.environ["ECHOCOACH_COACH_MODEL"]
241
+ if os.environ.get("ECHOCOACH_COACH_FALLBACK"):
242
+ updates["coach_fallbacks"] = tuple(
243
+ part.strip()
244
+ for part in os.environ["ECHOCOACH_COACH_FALLBACK"].split(",")
245
+ if part.strip()
246
+ )
247
  if os.environ.get("ECHOCOACH_MAX_SECONDS"):
248
  updates["max_seconds"] = int(os.environ["ECHOCOACH_MAX_SECONDS"])
249
 
libs/echocoach/src/echocoach/pipeline.py CHANGED
@@ -64,9 +64,14 @@ def run_echo_coach(
64
  transcript = asr.transcribe(str(clipped_path), language=language)
65
  trace.log_note("asr_complete", preset=asr_key, chars=len(transcript))
66
 
67
- fillers = analyze_fillers(transcript)
68
  pace = analyze_pace(transcript, duration)
69
- transcript_html = highlight_fillers_html(transcript, fillers)
 
 
 
 
 
70
 
71
  filler_chart, pace_chart = build_charts(
72
  transcript,
 
64
  transcript = asr.transcribe(str(clipped_path), language=language)
65
  trace.log_note("asr_complete", preset=asr_key, chars=len(transcript))
66
 
67
+ fillers = analyze_fillers(transcript) if language == "en" else FillerAnalysis(counts={}, spans=[], total=0)
68
  pace = analyze_pace(transcript, duration)
69
+ if language == "en":
70
+ transcript_html = highlight_fillers_html(transcript, fillers)
71
+ else:
72
+ import html
73
+
74
+ transcript_html = html.escape(transcript).replace("\n", "<br>")
75
 
76
  filler_chart, pace_chart = build_charts(
77
  transcript,
libs/echocoach/src/echocoach/prompts.py CHANGED
@@ -12,22 +12,49 @@ MODE_LABELS: dict[TeacherVoiceMode, str] = {
12
  "pitch": "Pitch practice",
13
  }
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  EXPLAIN_SYSTEM = """You are TeacherVoice, a friendly tutor who explains ideas in plain language.
16
  Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
17
  numbered outlines, or phrases like "let me think" or "first I need to".
18
- Use simple examples when helpful. If the student asks in another language, reply in that language.
19
  When source excerpts are provided, ground your answer in them and cite with [1], [2], etc."""
20
 
21
  LESSON_SYSTEM = """You are TeacherVoice, a lesson-planning coach for teachers and students.
22
  Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
23
  or meta commentary about how you will answer.
24
  Help outline and explain lesson content verbally: learning goals, key points, and a simple flow.
25
- If a lesson topic is set, stay focused on it. When source excerpts are provided, use them and cite [1], [2], etc."""
 
26
 
27
  PITCH_SYSTEM = """You are TeacherVoice, a supportive public-speaking coach in a live conversation.
28
  Give brief, actionable feedback on what the student just said (opening, clarity, energy, structure).
29
  Do not produce JSON or long reports — speak naturally in 2-4 sentences.
30
- Suggest one concrete improvement for their next attempt. For charts and pace analysis, expand **Deep pitch analysis** below the chat."""
31
 
32
  _MODE_SYSTEM: dict[TeacherVoiceMode, str] = {
33
  "explain": EXPLAIN_SYSTEM,
@@ -36,8 +63,39 @@ _MODE_SYSTEM: dict[TeacherVoiceMode, str] = {
36
  }
37
 
38
 
39
- def system_prompt_for_mode(mode: TeacherVoiceMode) -> str:
40
- return _MODE_SYSTEM[mode]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
 
43
  def topic_context_block(topic: str | None, mode: TeacherVoiceMode) -> str | None:
 
12
  "pitch": "Pitch practice",
13
  }
14
 
15
+ LANGUAGE_LESSON_MODES: frozenset[TeacherVoiceMode] = frozenset({"explain", "lesson"})
16
+
17
+ # ISO 639-1 codes mapped to Tiny Aya regional presets (see Cohere Labs field guide).
18
+ _AYA_FIRE_LANGS = frozenset({"hi", "bn", "ta", "te", "mr", "gu", "kn", "ml", "pa", "ur", "ne", "si"})
19
+ _AYA_EARTH_LANGS = frozenset({"ar", "sw", "am", "ha", "fa", "he", "so", "yo", "ig", "zu", "af"})
20
+ _AYA_WATER_LANGS = frozenset(
21
+ {"fr", "de", "es", "it", "pt", "nl", "pl", "el", "ja", "zh", "ko", "vi", "ru", "uk", "cs", "sv", "da", "fi", "no"}
22
+ )
23
+
24
+ _LANGUAGE_LABELS: dict[str, str] = {
25
+ "en": "English",
26
+ "fr": "French",
27
+ "de": "German",
28
+ "es": "Spanish",
29
+ "it": "Italian",
30
+ "pt": "Portuguese",
31
+ "nl": "Dutch",
32
+ "pl": "Polish",
33
+ "el": "Greek",
34
+ "ar": "Arabic",
35
+ "ja": "Japanese",
36
+ "zh": "Chinese",
37
+ "vi": "Vietnamese",
38
+ "ko": "Korean",
39
+ }
40
+
41
  EXPLAIN_SYSTEM = """You are TeacherVoice, a friendly tutor who explains ideas in plain language.
42
  Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
43
  numbered outlines, or phrases like "let me think" or "first I need to".
44
+ Use simple examples when helpful.
45
  When source excerpts are provided, ground your answer in them and cite with [1], [2], etc."""
46
 
47
  LESSON_SYSTEM = """You are TeacherVoice, a lesson-planning coach for teachers and students.
48
  Reply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,
49
  or meta commentary about how you will answer.
50
  Help outline and explain lesson content verbally: learning goals, key points, and a simple flow.
51
+ If a lesson topic is set, stay focused on it.
52
+ When source excerpts are provided, use them and cite [1], [2], etc."""
53
 
54
  PITCH_SYSTEM = """You are TeacherVoice, a supportive public-speaking coach in a live conversation.
55
  Give brief, actionable feedback on what the student just said (opening, clarity, energy, structure).
56
  Do not produce JSON or long reports — speak naturally in 2-4 sentences.
57
+ Suggest one concrete improvement for their next attempt. For charts and pace analysis, use Classic EchoCoach."""
58
 
59
  _MODE_SYSTEM: dict[TeacherVoiceMode, str] = {
60
  "explain": EXPLAIN_SYSTEM,
 
63
  }
64
 
65
 
66
+ def language_label(language: str) -> str:
67
+ code = (language or "en").strip().lower().split("-")[0]
68
+ return _LANGUAGE_LABELS.get(code, code or "English")
69
+
70
+
71
+ def language_instruction(language: str) -> str:
72
+ label = language_label(language)
73
+ return (
74
+ f"Target language: {label} ({language}). "
75
+ f"Reply ONLY in {label}. "
76
+ "If the student writes or speaks in another language, match their language instead."
77
+ )
78
+
79
+
80
+ def resolve_aya_preset(language: str, variant: str = "auto") -> str:
81
+ """Return a models.yaml preset key for the Tiny Aya coach.
82
+
83
+ Regional Water/Fire/Earth presets remain in models.yaml for future use but
84
+ default to Global so Spaces only load one gated model.
85
+ """
86
+ _ = language # language kept for API compatibility; Global handles 70+ langs
87
+ if variant and variant not in ("auto", ""):
88
+ if variant in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth"):
89
+ return "tiny-aya-global"
90
+ return variant
91
+ return "tiny-aya-global"
92
+
93
+
94
+ def system_prompt_for_mode(mode: TeacherVoiceMode, *, language: str | None = None) -> str:
95
+ base = _MODE_SYSTEM[mode]
96
+ if language:
97
+ return f"{base}\n\n{language_instruction(language)}"
98
+ return base
99
 
100
 
101
  def topic_context_block(topic: str | None, mode: TeacherVoiceMode) -> str | None:
libs/echocoach/src/echocoach/teacher_voice.py CHANGED
@@ -168,6 +168,7 @@ def _rag_turn_via_agent(
168
  model_key: str,
169
  backend: InferenceBackend,
170
  trace: TraceRecorder,
 
171
  ) -> tuple[str, str | None, str | None, str]:
172
  """Grounded answer via ResearchMind harness. Returns text, refs, status, display."""
173
  query = retrieval_query(user_text, topic=topic)
@@ -205,6 +206,7 @@ def _rag_turn_via_agent(
205
  mode=mode,
206
  backend=backend,
207
  trace=trace,
 
208
  )
209
  rag_refs = result.references_markdown or None
210
  return assistant_text, rag_refs, rag_status, display_reply
@@ -237,13 +239,14 @@ def _compact_teacher_reply(
237
  mode: TeacherVoiceMode,
238
  backend: InferenceBackend,
239
  trace: TraceRecorder,
 
240
  ) -> str:
241
  seed = strip_reasoning_output(raw_reply).strip() or raw_reply.strip()[:1200]
242
  messages = [
243
  {
244
  "role": "system",
245
  "content": (
246
- f"{system_prompt_for_mode(mode)}\n\n"
247
  "Rewrite the draft below into ONLY 2-4 spoken sentences for voice playback. "
248
  "Keep any [n] citations. No planning or labels."
249
  ),
@@ -263,6 +266,7 @@ def _finalize_voice_reply(
263
  mode: TeacherVoiceMode,
264
  backend: InferenceBackend,
265
  trace: TraceRecorder,
 
266
  ) -> tuple[str, str]:
267
  """Normalize model output into a complete spoken reply and chat display text."""
268
  assistant_text = strip_reasoning_output(raw_reply).strip()
@@ -278,6 +282,7 @@ def _finalize_voice_reply(
278
  mode=mode,
279
  backend=backend,
280
  trace=trace,
 
281
  )
282
  if not reply_ends_complete_sentence(assistant_text):
283
  assistant_text = _compact_teacher_reply(
@@ -285,6 +290,7 @@ def _finalize_voice_reply(
285
  mode=mode,
286
  backend=backend,
287
  trace=trace,
 
288
  )
289
  return assistant_text, assistant_text
290
 
@@ -296,8 +302,9 @@ def build_teacher_messages(
296
  user_text: str,
297
  topic: str | None = None,
298
  rag: RagContext | None = None,
 
299
  ) -> list[dict[str, str]]:
300
- system = system_prompt_for_mode(mode)
301
  topic_line = topic_context_block(topic, mode)
302
  if topic_line:
303
  system = f"{system}\n\n{topic_line}"
@@ -330,6 +337,7 @@ def _generate_teacher_reply(
330
  session_id: str,
331
  doc_ids: list[str] | None,
332
  tts_key: str,
 
333
  ) -> TeacherVoiceTurnResult:
334
  rag_refs: str | None = None
335
  rag_status: str | None = None
@@ -344,6 +352,7 @@ def _generate_teacher_reply(
344
  model_key=model_key,
345
  backend=backend,
346
  trace=trace,
 
347
  )
348
  else:
349
  messages = build_teacher_messages(
@@ -351,6 +360,7 @@ def _generate_teacher_reply(
351
  history=history,
352
  user_text=user_text,
353
  topic=topic,
 
354
  )
355
  raw_reply = backend.chat(messages, max_tokens=512, temperature=0.2)
356
  assistant_text, display_reply = _finalize_voice_reply(
@@ -358,20 +368,25 @@ def _generate_teacher_reply(
358
  mode=mode,
359
  backend=backend,
360
  trace=trace,
 
361
  )
362
  trace.log_llm(messages[-1]["content"], raw_reply)
363
  if mode in RAG_MODES:
364
  rag_status = _rag_off_status(session_id, doc_ids)
365
 
366
- voiceout_path, voiceout_first, voiceout_warning = synthesize_voice_reply(
367
- strip_references_for_tts(assistant_text),
368
- language=language,
369
- tts_preset=tts_key,
370
- chunk_first=True,
371
- out_subdir="teacher_voice",
372
- )
373
- if voiceout_path:
374
- trace.set_artifact(voiceout_path)
 
 
 
 
375
 
376
  new_history = append_chat_turn(
377
  history,
@@ -409,6 +424,7 @@ def run_teacher_voice_text_turn(
409
  use_rag: bool = False,
410
  session_id: str = "",
411
  doc_ids: list[str] | None = None,
 
412
  ) -> TeacherVoiceTurnResult:
413
  """Process a typed user message (skips ASR)."""
414
  user_text = user_text.strip()
@@ -451,6 +467,7 @@ def run_teacher_voice_text_turn(
451
  session_id=session_id,
452
  doc_ids=doc_ids,
453
  tts_key=tts_key,
 
454
  )
455
 
456
 
@@ -469,6 +486,7 @@ def run_teacher_voice_turn(
469
  session_id: str = "",
470
  doc_ids: list[str] | None = None,
471
  max_turn_seconds: int | None = None,
 
472
  ) -> TeacherVoiceTurnResult:
473
  if not audio_path:
474
  raise ValueError("No audio recording provided.")
@@ -512,7 +530,7 @@ def run_teacher_voice_turn(
512
  from echocoach.omni import is_omni_profile, try_omni_turn
513
 
514
  if is_omni_profile():
515
- system = system_prompt_for_mode(mode)
516
  topic_line = topic_context_block(topic, mode)
517
  if topic_line:
518
  system = f"{system}\n\n{topic_line}"
@@ -559,4 +577,5 @@ def run_teacher_voice_turn(
559
  session_id=session_id,
560
  doc_ids=doc_ids,
561
  tts_key=tts_key,
 
562
  )
 
168
  model_key: str,
169
  backend: InferenceBackend,
170
  trace: TraceRecorder,
171
+ language: str = "en",
172
  ) -> tuple[str, str | None, str | None, str]:
173
  """Grounded answer via ResearchMind harness. Returns text, refs, status, display."""
174
  query = retrieval_query(user_text, topic=topic)
 
206
  mode=mode,
207
  backend=backend,
208
  trace=trace,
209
+ language=language,
210
  )
211
  rag_refs = result.references_markdown or None
212
  return assistant_text, rag_refs, rag_status, display_reply
 
239
  mode: TeacherVoiceMode,
240
  backend: InferenceBackend,
241
  trace: TraceRecorder,
242
+ language: str = "en",
243
  ) -> str:
244
  seed = strip_reasoning_output(raw_reply).strip() or raw_reply.strip()[:1200]
245
  messages = [
246
  {
247
  "role": "system",
248
  "content": (
249
+ f"{system_prompt_for_mode(mode, language=language)}\n\n"
250
  "Rewrite the draft below into ONLY 2-4 spoken sentences for voice playback. "
251
  "Keep any [n] citations. No planning or labels."
252
  ),
 
266
  mode: TeacherVoiceMode,
267
  backend: InferenceBackend,
268
  trace: TraceRecorder,
269
+ language: str = "en",
270
  ) -> tuple[str, str]:
271
  """Normalize model output into a complete spoken reply and chat display text."""
272
  assistant_text = strip_reasoning_output(raw_reply).strip()
 
282
  mode=mode,
283
  backend=backend,
284
  trace=trace,
285
+ language=language,
286
  )
287
  if not reply_ends_complete_sentence(assistant_text):
288
  assistant_text = _compact_teacher_reply(
 
290
  mode=mode,
291
  backend=backend,
292
  trace=trace,
293
+ language=language,
294
  )
295
  return assistant_text, assistant_text
296
 
 
302
  user_text: str,
303
  topic: str | None = None,
304
  rag: RagContext | None = None,
305
+ language: str = "en",
306
  ) -> list[dict[str, str]]:
307
+ system = system_prompt_for_mode(mode, language=language)
308
  topic_line = topic_context_block(topic, mode)
309
  if topic_line:
310
  system = f"{system}\n\n{topic_line}"
 
337
  session_id: str,
338
  doc_ids: list[str] | None,
339
  tts_key: str,
340
+ auto_voiceout: bool = True,
341
  ) -> TeacherVoiceTurnResult:
342
  rag_refs: str | None = None
343
  rag_status: str | None = None
 
352
  model_key=model_key,
353
  backend=backend,
354
  trace=trace,
355
+ language=language,
356
  )
357
  else:
358
  messages = build_teacher_messages(
 
360
  history=history,
361
  user_text=user_text,
362
  topic=topic,
363
+ language=language,
364
  )
365
  raw_reply = backend.chat(messages, max_tokens=512, temperature=0.2)
366
  assistant_text, display_reply = _finalize_voice_reply(
 
368
  mode=mode,
369
  backend=backend,
370
  trace=trace,
371
+ language=language,
372
  )
373
  trace.log_llm(messages[-1]["content"], raw_reply)
374
  if mode in RAG_MODES:
375
  rag_status = _rag_off_status(session_id, doc_ids)
376
 
377
+ voiceout_path: str | None = None
378
+ voiceout_first: str | None = None
379
+ voiceout_warning: str | None = None
380
+ if auto_voiceout:
381
+ voiceout_path, voiceout_first, voiceout_warning = synthesize_voice_reply(
382
+ strip_references_for_tts(assistant_text),
383
+ language=language,
384
+ tts_preset=tts_key,
385
+ chunk_first=True,
386
+ out_subdir="teacher_voice",
387
+ )
388
+ if voiceout_path:
389
+ trace.set_artifact(voiceout_path)
390
 
391
  new_history = append_chat_turn(
392
  history,
 
424
  use_rag: bool = False,
425
  session_id: str = "",
426
  doc_ids: list[str] | None = None,
427
+ auto_voiceout: bool = True,
428
  ) -> TeacherVoiceTurnResult:
429
  """Process a typed user message (skips ASR)."""
430
  user_text = user_text.strip()
 
467
  session_id=session_id,
468
  doc_ids=doc_ids,
469
  tts_key=tts_key,
470
+ auto_voiceout=auto_voiceout,
471
  )
472
 
473
 
 
486
  session_id: str = "",
487
  doc_ids: list[str] | None = None,
488
  max_turn_seconds: int | None = None,
489
+ auto_voiceout: bool = True,
490
  ) -> TeacherVoiceTurnResult:
491
  if not audio_path:
492
  raise ValueError("No audio recording provided.")
 
530
  from echocoach.omni import is_omni_profile, try_omni_turn
531
 
532
  if is_omni_profile():
533
+ system = system_prompt_for_mode(mode, language=language)
534
  topic_line = topic_context_block(topic, mode)
535
  if topic_line:
536
  system = f"{system}\n\n{topic_line}"
 
577
  session_id=session_id,
578
  doc_ids=doc_ids,
579
  tts_key=tts_key,
580
+ auto_voiceout=auto_voiceout,
581
  )
libs/echocoach/tests/test_teacher_voice.py CHANGED
@@ -7,7 +7,7 @@ import pytest
7
  import soundfile as sf
8
 
9
  from inference.response_clean import reply_ends_complete_sentence
10
- from echocoach.prompts import PITCH_SYSTEM, system_prompt_for_mode
11
  from echocoach.teacher_voice import (
12
  RagContext,
13
  append_chat_turn,
@@ -131,8 +131,43 @@ def test_build_teacher_messages_includes_topic_and_rag():
131
  assert "Reply now in 2-4 complete spoken sentences only" in messages[-1]["content"]
132
 
133
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  def test_pitch_mode_system_prompt():
135
- assert "Deep pitch analysis" in system_prompt_for_mode("pitch")
136
  assert PITCH_SYSTEM == system_prompt_for_mode("pitch")
137
 
138
 
 
7
  import soundfile as sf
8
 
9
  from inference.response_clean import reply_ends_complete_sentence
10
+ from echocoach.prompts import PITCH_SYSTEM, resolve_aya_preset, system_prompt_for_mode
11
  from echocoach.teacher_voice import (
12
  RagContext,
13
  append_chat_turn,
 
131
  assert "Reply now in 2-4 complete spoken sentences only" in messages[-1]["content"]
132
 
133
 
134
+ def test_coach_model_chain_dedupes():
135
+ from echocoach.config import EchoCoachConfig, LanguageOption
136
+
137
+ cfg = EchoCoachConfig(
138
+ asr_preset="whisper-cpp-tiny",
139
+ tts_preset="piper-multilingual",
140
+ realtime_tts_preset=None,
141
+ coach_model="tiny-aya-global",
142
+ coach_fallbacks=("minicpm5-1b", "tiny-aya-global"),
143
+ max_seconds=30,
144
+ languages=[LanguageOption("en", "English")],
145
+ asr_presets={},
146
+ tts_presets={},
147
+ )
148
+ assert cfg.coach_model_chain() == ["tiny-aya-global", "minicpm5-1b"]
149
+
150
+
151
+ def test_resolve_aya_preset_uses_global_only():
152
+ assert resolve_aya_preset("fr", "auto") == "tiny-aya-global"
153
+ assert resolve_aya_preset("hi", "auto") == "tiny-aya-global"
154
+ assert resolve_aya_preset("en", "tiny-aya-water") == "tiny-aya-global"
155
+
156
+
157
+ def test_build_teacher_messages_includes_language_instruction():
158
+ messages = build_teacher_messages(
159
+ mode="lesson",
160
+ history=[],
161
+ user_text="Explique le fine-tuning.",
162
+ topic="ML",
163
+ language="fr",
164
+ )
165
+ assert "Target language: French" in messages[0]["content"]
166
+ assert "Reply ONLY in French" in messages[0]["content"]
167
+
168
+
169
  def test_pitch_mode_system_prompt():
170
+ assert "public-speaking coach" in system_prompt_for_mode("pitch")
171
  assert PITCH_SYSTEM == system_prompt_for_mode("pitch")
172
 
173
 
models.yaml CHANGED
@@ -67,3 +67,27 @@ models:
67
  backend: transformers
68
  model_id: ./models/finetuned/minicpm5-1b-lora-merged
69
  trust_remote_code: true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  backend: transformers
68
  model_id: ./models/finetuned/minicpm5-1b-lora-merged
69
  trust_remote_code: true
70
+
71
+ tiny-aya-global:
72
+ label: Tiny Aya Global 3.3B (multilingual coach)
73
+ backend: transformers
74
+ model_id: CohereLabs/tiny-aya-global
75
+ trust_remote_code: true
76
+
77
+ tiny-aya-water:
78
+ label: Tiny Aya Water 3.3B (European / Asia-Pacific)
79
+ backend: transformers
80
+ model_id: CohereLabs/tiny-aya-water
81
+ trust_remote_code: true
82
+
83
+ tiny-aya-fire:
84
+ label: Tiny Aya Fire 3.3B (South Asian)
85
+ backend: transformers
86
+ model_id: CohereLabs/tiny-aya-fire
87
+ trust_remote_code: true
88
+
89
+ tiny-aya-earth:
90
+ label: Tiny Aya Earth 3.3B (West Asian / African)
91
+ backend: transformers
92
+ model_id: CohereLabs/tiny-aya-earth
93
+ trust_remote_code: true
voice_models.yaml CHANGED
@@ -2,11 +2,13 @@
2
  # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
3
 
4
  defaults:
5
- asr_preset: whisper-cpp-tiny
6
  tts_preset: piper-multilingual
7
  # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
8
  realtime_tts_preset: vibevoice-realtime-0.5b
9
- coach_model: minicpm5-1b
 
 
10
  max_seconds: 30
11
 
12
  languages:
@@ -75,7 +77,7 @@ tts:
75
  pt: pt_BR-faber-medium
76
  nl: nl_NL-mls-medium
77
  pl: pl_PL-darkman-medium
78
- el: en_US-lessac-medium
79
  ar: ar_JO-kareem-medium
80
  ja: ja_JP-natsuki-medium
81
  zh: zh_CN-huayan-medium
 
2
  # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
3
 
4
  defaults:
5
+ asr_preset: cohere-transcribe
6
  tts_preset: piper-multilingual
7
  # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
8
  realtime_tts_preset: vibevoice-realtime-0.5b
9
+ coach_model: tiny-aya-global
10
+ coach_fallbacks:
11
+ - minicpm5-1b
12
  max_seconds: 30
13
 
14
  languages:
 
77
  pt: pt_BR-faber-medium
78
  nl: nl_NL-mls-medium
79
  pl: pl_PL-darkman-medium
80
+ el: el_GR-rapunzelina-low
81
  ar: ar_JO-kareem-medium
82
  ja: ja_JP-natsuki-medium
83
  zh: zh_CN-huayan-medium