File size: 18,563 Bytes
bf15bc3
 
871f869
a31982f
aac5f23
bf15bc3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a31982f
bf15bc3
a31982f
bf15bc3
a31982f
bf15bc3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b16996a
bf15bc3
81da2d5
 
 
 
 
 
 
a31982f
 
727cb75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83e828a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1719c2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aac5f23
29e2c18
9939b9d
aac5f23
9939b9d
1719c2a
 
 
 
 
 
 
 
 
 
9939b9d
29e2c18
9939b9d
 
 
 
 
 
aac5f23
9939b9d
aac5f23
9939b9d
aac5f23
9939b9d
 
aac5f23
 
 
 
 
 
 
 
 
 
 
 
9939b9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29e2c18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a31982f
 
 
 
 
bf15bc3
 
 
 
 
 
 
 
 
 
 
 
 
b16996a
 
 
 
 
 
 
 
 
 
 
 
bf15bc3
 
 
 
 
 
 
 
 
 
 
 
 
871f869
bf15bc3
871f869
bf15bc3
 
871f869
 
 
bf15bc3
 
871f869
bf15bc3
871f869
bf15bc3
 
 
871f869
bf15bc3
871f869
bf15bc3
 
 
871f869
bf15bc3
871f869
 
 
 
 
 
bf15bc3
 
 
 
 
871f869
 
 
 
bf15bc3
 
 
 
 
 
871f869
bf15bc3
 
871f869
 
 
 
 
 
 
 
 
bf15bc3
871f869
b16996a
871f869
b16996a
871f869
bf15bc3
871f869
 
bf15bc3
871f869
bf15bc3
871f869
 
 
 
b16996a
871f869
b16996a
871f869
bf15bc3
871f869
 
 
 
 
 
 
bf15bc3
871f869
 
 
 
 
 
 
 
 
bf15bc3
 
871f869
 
 
 
 
 
bf15bc3
 
871f869
bf15bc3
871f869
bf15bc3
871f869
 
 
 
 
 
bf15bc3
 
 
 
 
b16996a
 
 
871f869
 
 
 
 
 
 
b16996a
bf15bc3
 
 
 
 
871f869
 
 
 
 
 
b16996a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
# Usage

How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).

The primary UI is the **Lesson slides** tab (topic β†’ local model outline β†’ downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (OpenBMB + Whisper by default), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.

## Prerequisites

- [uv](https://docs.astral.sh/uv/) installed
- Python 3.12 (see `.python-version`)
- For Docker testing: Docker installed locally
- For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org

## Local development

### 1. Install dependencies

```bash
uv sync --all-packages
```

### 2. Configure environment (optional)

```bash
cp .env.example .env
```

Edit `.env` if you want a different model preset. Default is `minicpm5-1b` (transformers).

### 3. Pre-download the model (optional for GGUF presets)

If using a GGUF preset (`qwen3b-gguf`), pre-download avoids a long wait on first use:

```bash
uv run python scripts/download_model.py
```

Then add the printed path to `.env`:

```bash
MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
```

### 4. Run the Gradio app

```bash
uv run --package gradio-space python -m gradio_space.app
```

Open [http://localhost:7860](http://localhost:7860).

| URL | UI |
|-----|-----|
| `/` | **Studio** β€” custom HTML/CSS/JS workspace (Off Brand entry) |
| `/classic` | **Classic** β€” full Gradio tabs, settings, Chat (debug) |

The header in Classic includes a link back to Studio UI.

The model loads on the **first Generate** (Lesson slides) or chat message. Agent traces are written to `outputs/traces/`. After code changes, restart the process to pick up updates.

### Switching models locally (transformers ↔ llama.cpp)

For local dev you can switch presets at runtime without restarting:

```bash
# .env
ALLOW_MODEL_SWITCH=true
ACTIVE_MODEL=minicpm-v-4.6          # startup default (transformers)
```

| UI | Where to switch |
|----|-----------------|
| **Classic** (`/classic`) | **Settings** accordion β†’ Model preset dropdown (reloads on change) |
| **Classic** Chat tab | Model preset dropdown (syncs app-wide) |
| **Studio** (`/`) | Settings drawer β†’ Model preset; Debug tab has the same list |

| Goal | Preset key |
|------|------------|
| MiniCPM-V 4.6 transformers (full VLM) | `minicpm-v-4.6` |
| MiniCPM-V 4.6 llama.cpp / Llama Champion | `minicpm-v-4.6-gguf` |
| MiniCPM5 1B text | `minicpm5-1b` |
| Lesson LoRA (transformers only) | `minicpm5-1b-lesson-lora` |

Prefetch the GGUF weights (optional):

```bash
uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf
```

On Hugging Face Space, keep `ALLOW_MODEL_SWITCH=false` and pin one preset via `ACTIVE_MODEL`.

### Lesson slides β€” research sources

The **Lesson slides** tab can ground outlines on external sources before building the deck:

| Source mode | What it does |
| ----------- | ------------ |
| **None (model only)** | Default β€” outline from the local model only |
| **Web search** | Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides |
| **RAG (indexed sources)** | Use a **ResearchMind session** and/or URLs/files you provide on this tab |

When **Web search** is selected, choose a **search workflow**:

| Workflow | Steps |
| -------- | ----- |
| **Two-step search (suggest & confirm)** | Click **Discover sources** β†’ select URLs β†’ **Generate lesson slides** |
| **Auto search & ingest** | Click **Generate lesson slides** only β€” search, ingest, and outline in one step |

**RAG** mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step.

Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).

Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).

### EchoCoach β€” voice practice

The **EchoCoach** tab records up to 30 seconds, then runs a local pipeline:

**Getting audio in**

- **Record from this computer** β€” click **Start recording**, speak, then **Stop recording** (uses PipeWire `pw-record` when available). The slider is a max-length safety cap.
- **Browser Record** β€” needs mic permission and a secure context; open **http://localhost:7860** (not `0.0.0.0` or a LAN IP).
- **Upload** β€” drop a `.wav` or `.mp3` file (works everywhere, including HF Space).

If recordings sound silent, check system mic input/mute or set `ECHOCOACH_CAPTURE_DEVICE` in `.env` (see `arecord -L` or `pw-cli ls Node`).

Pipeline steps:

1. **ASR** β€” Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base
2. **Analysis** β€” filler highlights, pace score, matplotlib charts
3. **Coach** β€” rewrite + tips from the text LLM (`ACTIVE_MODEL`, default `minicpm5-1b`)
4. **VoiceOut** β€” Piper TTS speaks the summary (or full rewrite if checked)

Install optional extras:

```bash
# Whisper.cpp fallback ASR (CPU)
uv sync --package echocoach --extra whisper

# Piper VoiceOut TTS
uv sync --package echocoach --extra piper
python -m piper.download_voices en_US-lessac-medium
```

Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:

| Variable | Default | Description |
| -------- | ------- | ----------- |
| `ECHOCOACH_ASR_PRESET` | `whisper-cpp-base` | ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo |
| `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
| `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
| `ECHOCOACH_COACH_MODEL` | `minicpm5-1b-language-lesson-hub` | Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) |
| `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
| `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |

**Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face β€” run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.

Smoke tests (analysis only, no GPU):

```bash
bash scripts/echo_coach_smoke.sh
```

### Language lessons β€” multilingual coach (Studio tab)

The **Language lessons** tab is the primary voice learning experience: one page for **text**, **hold-to-talk mic**, and **audio upload**, with optional auto VoiceOut on every reply.

| Input | Output |
| ----- | ------ |
| Type a question | Chat bubble in target language |
| Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
| **Other (text only)** language code | Written lesson via coach prompts (no Piper voice for unsupported codes) |

**Default stack (Cohere-free):** [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR β†’ [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) β†’ Piper or VibeVoice Realtime for speech out.

Rebuild training JSONL from Hugging Face sources:

```bash
uv run python research/data/build_language_lesson_chat.py
modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish
```

Optional **Cohere Labs partner demo:** [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global).

Default `.env` / Space secrets:

```bash
ECHOCOACH_ASR_PRESET=whisper-cpp-base
ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
ECHOCOACH_COACH_FALLBACK=minicpm5-1b
ECHOCOACH_TTS_PRESET=piper-multilingual
ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
```

| Mode | Purpose |
| ---- | ------- |
| **Explain** | Tutor any topic in plain language |
| **Lesson coach** | Discuss and outline lesson content |

Turn-based (not full duplex): speak β†’ wait β†’ hear reply. **Auto-speak replies** synthesizes TTS each turn when the language has a Piper voice.

Pitch metrics and monologue analysis live in **Classic UI β†’ EchoCoach** (`/classic`).

### TeacherVoice β€” Classic UI (turn-based)

The **TeacherVoice** tab in `/classic` is the legacy multi-turn voice teacher β€” same pipeline as Language lessons, plus **Pitch practice** mode.

| Mode | Purpose |
| ---- | ------- |
| **Explain** | Tutor any topic in plain language |
| **Lesson coach** | Discuss and outline lesson content verbally |
| **Pitch practice** | Short live speaking tips each turn |

**EchoCoach vs TeacherVoice**

| | EchoCoach | TeacherVoice |
| --- | --- | --- |
| Interaction | One-shot after **Analyze pitch** | Multi-turn **Send turn** |
| Best for | Pace/filler charts, JSON rewrite report | Q&A, lesson discussion, conversational pitch tips |
| TTS | One VoiceOut clip per analysis | Voice reply every turn (first sentence plays quickly when Piper is installed) |
| RAG | No | Optional ResearchMind grounding (Explain / Lesson) |

**Flow per turn:** record up to **15s** β†’ ASR β†’ text LLM with chat history β†’ Piper TTS (auto-plays when installed).

After each reply, use **Speak last reply** or **Speak first sentence** to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped).

Install Piper for voice output (included in `gradio-space` deps after `uv sync`):

```bash
uv sync
python -m piper.download_voices en_US-lessac-medium
```

Voices are stored under `models/piper/` (gitignored) or `~/.local/share/piper/voices/`. **Restart the Gradio app** after installing Piper so the Speak buttons can synthesize audio.

**Realtime TTS (VibeVoice)** β€” [microsoft/VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) is registered in `voice_models.yaml` as `vibevoice-realtime-0.5b` (~300 ms to first audio, streaming text-in). TeacherVoice uses `realtime_tts_preset` from YAML by default; override with `ECHOCOACH_REALTIME_TTS_PRESET` or set `ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b` globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card.

Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug).

Reuse VoiceOut in other tabs via `gradio_space.voice_helpers.speak_last_assistant_reply`.

Optional omni profile (GPU, experimental β€” falls back to ASR+LLM+Piper):

```bash
ECHOCOACH_VOICE_PROFILE=omni
ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
```

Unit tests (no GPU):

```bash
uv run pytest libs/echocoach/tests/test_teacher_voice.py -q
```

### 5. Upload agent trace (Sharing is Caring badge)

```bash
uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces
```

### 5. Quick sanity checks

```bash
# Inference package resolves
uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"

# Gradio app module loads
uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
```

### Local env reference


| Variable            | Default                           | Description                                |
| ------------------- | --------------------------------- | ------------------------------------------ |
| `INFERENCE_BACKEND` | `llama_cpp`                       | `llama_cpp` or `transformers`              |
| `MODEL_REPO`        | `Qwen/Qwen2.5-3B-Instruct-GGUF`   | Hub repo for GGUF                          |
| `MODEL_FILE`        | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename                              |
| `MODEL_PATH`        | β€”                                 | Local GGUF path (skips Hub download)       |
| `N_CTX`             | `4096`                            | Context window                             |
| `N_GPU_LAYERS`      | `0`                               | GPU layers for llama.cpp (`0` = CPU only)  |
| `PORT`              | `7860`                            | Gradio listen port                         |
| `MODEL_ID`          | `Qwen/Qwen2.5-3B-Instruct`        | Used when `INFERENCE_BACKEND=transformers` |


### Optional: transformers backend

Heavier install; only needed if you switch away from llama.cpp:

```bash
uv sync --package inference --extra transformers
INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
  uv run --package gradio-space python -m gradio_space.app
```

---

## Gradio SDK local smoke test (matches HF Space build)

Before pushing to Hugging Face, verify the Gradio SDK entry point:

```bash
python -m venv .venv-gradio && source .venv-gradio/bin/activate
pip install -r requirements.txt
ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py
```

Open [http://localhost:7860](http://localhost:7860) β€” Studio at `/`, Classic at `/classic`.

Day-to-day development can still use `uv run` (see above); this path mirrors what HF installs from `requirements.txt`.

---

## Hugging Face Space deployment (Gradio SDK + ZeroGPU)

The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md) (`sdk: gradio`, `app_file: app.py`).

### 1. Push code to GitHub

Make sure `main` contains at minimum:

- `app.py`, `requirements.txt`, `packages.txt`
- `README.md` (with `sdk: gradio`, `sdk_version`, `app_file: app.py`)
- `models.yaml`, `skills/`
- `apps/gradio-space/` and all `libs/*` packages

The root `Dockerfile` stays in the repo for a later Docker SDK deploy (see below).

### 2. Create the Space

1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon)
2. **New Space**
3. Name: e.g. `lesson-agent` or `small-model-hackathon`
4. SDK: **Gradio** (Blank template)
5. Hardware: **ZeroGPU** (creator needs PRO/Team) or **GPU basic**
6. Link your GitHub repo, or push directly to the Space git remote

CLI alternative (if you have `hf` installed and org access):

```bash
hf repo create build-small-hackathon/<your-space-name> \
  --repo-type space \
  --space_sdk gradio
```

### 3. Set Space environment variables

In the Space **Settings β†’ Variables and secrets**:

| Variable | Value |
| -------- | ----- |
| `ACTIVE_MODEL` | `minicpm5-1b` |
| `ALLOW_MODEL_SWITCH` | `false` |
| `RESEARCHMIND_DATA_DIR` | `/tmp/researchmind` |

Default preset in [`models.yaml`](models.yaml) is `minicpm5-1b` (transformers) β€” suitable for ZeroGPU.

### 4. Build and verify

HF installs from `requirements.txt` and runs root `app.py`. Check the **Logs** tab for:

- Successful pip install (first build may take several minutes β€” `llama-cpp-python` compiles)
- `Running on local URL: 0.0.0.0:7860`

Smoke test on the live Space:

1. **`/`** β€” Studio UI loads
2. **`/classic`** β€” all tabs render
3. Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides")
4. First LLM request may be slow (model download + ZeroGPU queue)

### 5. ZeroGPU notes

LLM handlers use `@spaces.GPU` via [`gradio_space/spaces_runtime.py`](apps/gradio-space/src/gradio_space/spaces_runtime.py). If you see **No CUDA GPUs are available**, an inference path is running outside a decorated handler.

Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task.

### 6. Optional: persistent model cache

Attach a **Storage Bucket** in Space settings so Hub model weights survive restarts.

---

## Docker SDK deployment (later)

Both deploy paths live on the same branch. HF reads **one** `sdk:` from README β€” switch to Docker when you are ready for a dedicated-GPU Space.

1. Change [README.md](README.md) frontmatter to `sdk: docker`, `app_port: 7860` (remove `sdk_version` / `app_file`)
2. Create or reconfigure a Space with **Docker** SDK and **GPU basic** hardware
3. Set the same env vars (`ACTIVE_MODEL=minicpm5-1b`, etc.)

### Local Docker smoke test

```bash
docker build -t hackathon-space .
docker run --rm -p 7860:7860 \
  -e ACTIVE_MODEL=minicpm5-1b \
  -e ALLOW_MODEL_SWITCH=false \
  -e RESEARCHMIND_DATA_DIR=/tmp/researchmind \
  hackathon-space
```

Open [http://localhost:7860](http://localhost:7860) β€” Studio at `/`, Classic tabs at `/classic`. Stop with `Ctrl+C`.

To use a pre-downloaded local GGUF model inside Docker, mount it and set `MODEL_PATH`:

```bash
docker run --rm -p 7860:7860 \
  -v "$(pwd)/models:/app/models:ro" \
  -e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
  hackathon-space
```

---

## Troubleshooting


| Symptom                                  | Likely cause                      | Fix                                                                  |
| ---------------------------------------- | --------------------------------- | -------------------------------------------------------------------- |
| First chat hangs / slow                  | Model downloading from Hub        | Wait on Space; use Storage Bucket for cache                            |
| `Failed to load model` in chat           | Wrong `ACTIVE_MODEL` preset       | Use `minicpm5-1b` or valid key from `models.yaml`                    |
| Space build fails on pip install         | `llama-cpp-python` compile        | Check Logs; default preset avoids GGUF at runtime                    |
| Space build fails                        | Malformed README YAML             | Ensure `sdk: gradio` and `app_file: app.py` in README frontmatter    |
| No CUDA GPUs on ZeroGPU                  | Handler outside `@spaces.GPU`     | LLM entry points must use `gpu_task` in `spaces_runtime.py`          |
| Docker build fails on `llama-cpp-python` | Missing build tools               | Dockerfile installs `build-essential` and `cmake`                    |
| Port already in use locally              | Another process on 7860           | `PORT=7861 python app.py` or `uv run ...`                            |


---

## Entrypoint summary

| Environment | How to run |
| ----------- | ---------- |
| Local dev (uv) | `uv run --package gradio-space python -m gradio_space.app` |
| Local Gradio SDK smoke | `pip install -r requirements.txt && python app.py` |
| HF Gradio Space | HF runs root `app.py` automatically |
| Docker (later) | `docker run -p 7860:7860 hackathon-space` (after README `sdk: docker`) |