Spaces:
Sleeping
Sleeping
File size: 18,563 Bytes
bf15bc3 871f869 a31982f aac5f23 bf15bc3 a31982f bf15bc3 a31982f bf15bc3 a31982f bf15bc3 b16996a bf15bc3 81da2d5 a31982f 727cb75 83e828a 1719c2a aac5f23 29e2c18 9939b9d aac5f23 9939b9d 1719c2a 9939b9d 29e2c18 9939b9d aac5f23 9939b9d aac5f23 9939b9d aac5f23 9939b9d aac5f23 9939b9d 29e2c18 a31982f bf15bc3 b16996a bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 b16996a 871f869 b16996a 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 b16996a 871f869 b16996a 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 871f869 bf15bc3 b16996a 871f869 b16996a bf15bc3 871f869 b16996a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 | # Usage
How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
The primary UI is the **Lesson slides** tab (topic β local model outline β downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (OpenBMB + Whisper by default), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
## Prerequisites
- [uv](https://docs.astral.sh/uv/) installed
- Python 3.12 (see `.python-version`)
- For Docker testing: Docker installed locally
- For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org
## Local development
### 1. Install dependencies
```bash
uv sync --all-packages
```
### 2. Configure environment (optional)
```bash
cp .env.example .env
```
Edit `.env` if you want a different model preset. Default is `minicpm5-1b` (transformers).
### 3. Pre-download the model (optional for GGUF presets)
If using a GGUF preset (`qwen3b-gguf`), pre-download avoids a long wait on first use:
```bash
uv run python scripts/download_model.py
```
Then add the printed path to `.env`:
```bash
MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
```
### 4. Run the Gradio app
```bash
uv run --package gradio-space python -m gradio_space.app
```
Open [http://localhost:7860](http://localhost:7860).
| URL | UI |
|-----|-----|
| `/` | **Studio** β custom HTML/CSS/JS workspace (Off Brand entry) |
| `/classic` | **Classic** β full Gradio tabs, settings, Chat (debug) |
The header in Classic includes a link back to Studio UI.
The model loads on the **first Generate** (Lesson slides) or chat message. Agent traces are written to `outputs/traces/`. After code changes, restart the process to pick up updates.
### Switching models locally (transformers β llama.cpp)
For local dev you can switch presets at runtime without restarting:
```bash
# .env
ALLOW_MODEL_SWITCH=true
ACTIVE_MODEL=minicpm-v-4.6 # startup default (transformers)
```
| UI | Where to switch |
|----|-----------------|
| **Classic** (`/classic`) | **Settings** accordion β Model preset dropdown (reloads on change) |
| **Classic** Chat tab | Model preset dropdown (syncs app-wide) |
| **Studio** (`/`) | Settings drawer β Model preset; Debug tab has the same list |
| Goal | Preset key |
|------|------------|
| MiniCPM-V 4.6 transformers (full VLM) | `minicpm-v-4.6` |
| MiniCPM-V 4.6 llama.cpp / Llama Champion | `minicpm-v-4.6-gguf` |
| MiniCPM5 1B text | `minicpm5-1b` |
| Lesson LoRA (transformers only) | `minicpm5-1b-lesson-lora` |
Prefetch the GGUF weights (optional):
```bash
uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf
```
On Hugging Face Space, keep `ALLOW_MODEL_SWITCH=false` and pin one preset via `ACTIVE_MODEL`.
### Lesson slides β research sources
The **Lesson slides** tab can ground outlines on external sources before building the deck:
| Source mode | What it does |
| ----------- | ------------ |
| **None (model only)** | Default β outline from the local model only |
| **Web search** | Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides |
| **RAG (indexed sources)** | Use a **ResearchMind session** and/or URLs/files you provide on this tab |
When **Web search** is selected, choose a **search workflow**:
| Workflow | Steps |
| -------- | ----- |
| **Two-step search (suggest & confirm)** | Click **Discover sources** β select URLs β **Generate lesson slides** |
| **Auto search & ingest** | Click **Generate lesson slides** only β search, ingest, and outline in one step |
**RAG** mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step.
Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).
Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).
### EchoCoach β voice practice
The **EchoCoach** tab records up to 30 seconds, then runs a local pipeline:
**Getting audio in**
- **Record from this computer** β click **Start recording**, speak, then **Stop recording** (uses PipeWire `pw-record` when available). The slider is a max-length safety cap.
- **Browser Record** β needs mic permission and a secure context; open **http://localhost:7860** (not `0.0.0.0` or a LAN IP).
- **Upload** β drop a `.wav` or `.mp3` file (works everywhere, including HF Space).
If recordings sound silent, check system mic input/mute or set `ECHOCOACH_CAPTURE_DEVICE` in `.env` (see `arecord -L` or `pw-cli ls Node`).
Pipeline steps:
1. **ASR** β Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base
2. **Analysis** β filler highlights, pace score, matplotlib charts
3. **Coach** β rewrite + tips from the text LLM (`ACTIVE_MODEL`, default `minicpm5-1b`)
4. **VoiceOut** β Piper TTS speaks the summary (or full rewrite if checked)
Install optional extras:
```bash
# Whisper.cpp fallback ASR (CPU)
uv sync --package echocoach --extra whisper
# Piper VoiceOut TTS
uv sync --package echocoach --extra piper
python -m piper.download_voices en_US-lessac-medium
```
Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:
| Variable | Default | Description |
| -------- | ------- | ----------- |
| `ECHOCOACH_ASR_PRESET` | `whisper-cpp-base` | ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo |
| `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
| `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
| `ECHOCOACH_COACH_MODEL` | `minicpm5-1b-language-lesson-hub` | Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) |
| `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
| `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
**Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face β run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.
Smoke tests (analysis only, no GPU):
```bash
bash scripts/echo_coach_smoke.sh
```
### Language lessons β multilingual coach (Studio tab)
The **Language lessons** tab is the primary voice learning experience: one page for **text**, **hold-to-talk mic**, and **audio upload**, with optional auto VoiceOut on every reply.
| Input | Output |
| ----- | ------ |
| Type a question | Chat bubble in target language |
| Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
| **Other (text only)** language code | Written lesson via coach prompts (no Piper voice for unsupported codes) |
**Default stack (Cohere-free):** [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR β [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) β Piper or VibeVoice Realtime for speech out.
Rebuild training JSONL from Hugging Face sources:
```bash
uv run python research/data/build_language_lesson_chat.py
modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish
```
Optional **Cohere Labs partner demo:** [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global).
Default `.env` / Space secrets:
```bash
ECHOCOACH_ASR_PRESET=whisper-cpp-base
ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
ECHOCOACH_COACH_FALLBACK=minicpm5-1b
ECHOCOACH_TTS_PRESET=piper-multilingual
ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
```
| Mode | Purpose |
| ---- | ------- |
| **Explain** | Tutor any topic in plain language |
| **Lesson coach** | Discuss and outline lesson content |
Turn-based (not full duplex): speak β wait β hear reply. **Auto-speak replies** synthesizes TTS each turn when the language has a Piper voice.
Pitch metrics and monologue analysis live in **Classic UI β EchoCoach** (`/classic`).
### TeacherVoice β Classic UI (turn-based)
The **TeacherVoice** tab in `/classic` is the legacy multi-turn voice teacher β same pipeline as Language lessons, plus **Pitch practice** mode.
| Mode | Purpose |
| ---- | ------- |
| **Explain** | Tutor any topic in plain language |
| **Lesson coach** | Discuss and outline lesson content verbally |
| **Pitch practice** | Short live speaking tips each turn |
**EchoCoach vs TeacherVoice**
| | EchoCoach | TeacherVoice |
| --- | --- | --- |
| Interaction | One-shot after **Analyze pitch** | Multi-turn **Send turn** |
| Best for | Pace/filler charts, JSON rewrite report | Q&A, lesson discussion, conversational pitch tips |
| TTS | One VoiceOut clip per analysis | Voice reply every turn (first sentence plays quickly when Piper is installed) |
| RAG | No | Optional ResearchMind grounding (Explain / Lesson) |
**Flow per turn:** record up to **15s** β ASR β text LLM with chat history β Piper TTS (auto-plays when installed).
After each reply, use **Speak last reply** or **Speak first sentence** to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped).
Install Piper for voice output (included in `gradio-space` deps after `uv sync`):
```bash
uv sync
python -m piper.download_voices en_US-lessac-medium
```
Voices are stored under `models/piper/` (gitignored) or `~/.local/share/piper/voices/`. **Restart the Gradio app** after installing Piper so the Speak buttons can synthesize audio.
**Realtime TTS (VibeVoice)** β [microsoft/VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) is registered in `voice_models.yaml` as `vibevoice-realtime-0.5b` (~300 ms to first audio, streaming text-in). TeacherVoice uses `realtime_tts_preset` from YAML by default; override with `ECHOCOACH_REALTIME_TTS_PRESET` or set `ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b` globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card.
Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug).
Reuse VoiceOut in other tabs via `gradio_space.voice_helpers.speak_last_assistant_reply`.
Optional omni profile (GPU, experimental β falls back to ASR+LLM+Piper):
```bash
ECHOCOACH_VOICE_PROFILE=omni
ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
```
Unit tests (no GPU):
```bash
uv run pytest libs/echocoach/tests/test_teacher_voice.py -q
```
### 5. Upload agent trace (Sharing is Caring badge)
```bash
uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces
```
### 5. Quick sanity checks
```bash
# Inference package resolves
uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"
# Gradio app module loads
uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
```
### Local env reference
| Variable | Default | Description |
| ------------------- | --------------------------------- | ------------------------------------------ |
| `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
| `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
| `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
| `MODEL_PATH` | β | Local GGUF path (skips Hub download) |
| `N_CTX` | `4096` | Context window |
| `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) |
| `PORT` | `7860` | Gradio listen port |
| `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
### Optional: transformers backend
Heavier install; only needed if you switch away from llama.cpp:
```bash
uv sync --package inference --extra transformers
INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
uv run --package gradio-space python -m gradio_space.app
```
---
## Gradio SDK local smoke test (matches HF Space build)
Before pushing to Hugging Face, verify the Gradio SDK entry point:
```bash
python -m venv .venv-gradio && source .venv-gradio/bin/activate
pip install -r requirements.txt
ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py
```
Open [http://localhost:7860](http://localhost:7860) β Studio at `/`, Classic at `/classic`.
Day-to-day development can still use `uv run` (see above); this path mirrors what HF installs from `requirements.txt`.
---
## Hugging Face Space deployment (Gradio SDK + ZeroGPU)
The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md) (`sdk: gradio`, `app_file: app.py`).
### 1. Push code to GitHub
Make sure `main` contains at minimum:
- `app.py`, `requirements.txt`, `packages.txt`
- `README.md` (with `sdk: gradio`, `sdk_version`, `app_file: app.py`)
- `models.yaml`, `skills/`
- `apps/gradio-space/` and all `libs/*` packages
The root `Dockerfile` stays in the repo for a later Docker SDK deploy (see below).
### 2. Create the Space
1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon)
2. **New Space**
3. Name: e.g. `lesson-agent` or `small-model-hackathon`
4. SDK: **Gradio** (Blank template)
5. Hardware: **ZeroGPU** (creator needs PRO/Team) or **GPU basic**
6. Link your GitHub repo, or push directly to the Space git remote
CLI alternative (if you have `hf` installed and org access):
```bash
hf repo create build-small-hackathon/<your-space-name> \
--repo-type space \
--space_sdk gradio
```
### 3. Set Space environment variables
In the Space **Settings β Variables and secrets**:
| Variable | Value |
| -------- | ----- |
| `ACTIVE_MODEL` | `minicpm5-1b` |
| `ALLOW_MODEL_SWITCH` | `false` |
| `RESEARCHMIND_DATA_DIR` | `/tmp/researchmind` |
Default preset in [`models.yaml`](models.yaml) is `minicpm5-1b` (transformers) β suitable for ZeroGPU.
### 4. Build and verify
HF installs from `requirements.txt` and runs root `app.py`. Check the **Logs** tab for:
- Successful pip install (first build may take several minutes β `llama-cpp-python` compiles)
- `Running on local URL: 0.0.0.0:7860`
Smoke test on the live Space:
1. **`/`** β Studio UI loads
2. **`/classic`** β all tabs render
3. Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides")
4. First LLM request may be slow (model download + ZeroGPU queue)
### 5. ZeroGPU notes
LLM handlers use `@spaces.GPU` via [`gradio_space/spaces_runtime.py`](apps/gradio-space/src/gradio_space/spaces_runtime.py). If you see **No CUDA GPUs are available**, an inference path is running outside a decorated handler.
Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task.
### 6. Optional: persistent model cache
Attach a **Storage Bucket** in Space settings so Hub model weights survive restarts.
---
## Docker SDK deployment (later)
Both deploy paths live on the same branch. HF reads **one** `sdk:` from README β switch to Docker when you are ready for a dedicated-GPU Space.
1. Change [README.md](README.md) frontmatter to `sdk: docker`, `app_port: 7860` (remove `sdk_version` / `app_file`)
2. Create or reconfigure a Space with **Docker** SDK and **GPU basic** hardware
3. Set the same env vars (`ACTIVE_MODEL=minicpm5-1b`, etc.)
### Local Docker smoke test
```bash
docker build -t hackathon-space .
docker run --rm -p 7860:7860 \
-e ACTIVE_MODEL=minicpm5-1b \
-e ALLOW_MODEL_SWITCH=false \
-e RESEARCHMIND_DATA_DIR=/tmp/researchmind \
hackathon-space
```
Open [http://localhost:7860](http://localhost:7860) β Studio at `/`, Classic tabs at `/classic`. Stop with `Ctrl+C`.
To use a pre-downloaded local GGUF model inside Docker, mount it and set `MODEL_PATH`:
```bash
docker run --rm -p 7860:7860 \
-v "$(pwd)/models:/app/models:ro" \
-e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
hackathon-space
```
---
## Troubleshooting
| Symptom | Likely cause | Fix |
| ---------------------------------------- | --------------------------------- | -------------------------------------------------------------------- |
| First chat hangs / slow | Model downloading from Hub | Wait on Space; use Storage Bucket for cache |
| `Failed to load model` in chat | Wrong `ACTIVE_MODEL` preset | Use `minicpm5-1b` or valid key from `models.yaml` |
| Space build fails on pip install | `llama-cpp-python` compile | Check Logs; default preset avoids GGUF at runtime |
| Space build fails | Malformed README YAML | Ensure `sdk: gradio` and `app_file: app.py` in README frontmatter |
| No CUDA GPUs on ZeroGPU | Handler outside `@spaces.GPU` | LLM entry points must use `gpu_task` in `spaces_runtime.py` |
| Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile installs `build-essential` and `cmake` |
| Port already in use locally | Another process on 7860 | `PORT=7861 python app.py` or `uv run ...` |
---
## Entrypoint summary
| Environment | How to run |
| ----------- | ---------- |
| Local dev (uv) | `uv run --package gradio-space python -m gradio_space.app` |
| Local Gradio SDK smoke | `pip install -r requirements.txt && python app.py` |
| HF Gradio Space | HF runs root `app.py` automatically |
| Docker (later) | `docker run -p 7860:7860 hackathon-space` (after README `sdk: docker`) |
|