Spaces:
Runtime error
Runtime error
Reconcile README with code: MiniCPM-V 4.6 (multimodal/OCR), fix launch + /gradio claims, correct dep pins, add submission tags + team
#1
by nz-nz - opened
README.md
CHANGED
|
@@ -4,22 +4,35 @@ emoji: 📚
|
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version: 6.
|
| 8 |
app_file: server.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
# 📚 Recall — an AI study partner that gets smarter about what you get wrong
|
| 14 |
|
| 15 |
-
Upload your study material
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
Hackathon** (Backyard AI track).
|
| 19 |
|
| 20 |
-
- **Model:** [openbmb/
|
| 21 |
- **Platform:** Gradio app, hosted as a Hugging Face Space
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
## Run it (stub mode — no GPU, no model download)
|
| 24 |
|
| 25 |
```bash
|
|
@@ -32,9 +45,11 @@ the full loop in minute one.
|
|
| 32 |
|
| 33 |
`server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
|
| 34 |
API over the existing backend — the learning/content logic and the `schema.py`
|
| 35 |
-
data contract are treated as an API and are never modified.
|
| 36 |
-
|
| 37 |
-
|
|
|
|
|
|
|
| 38 |
|
| 39 |
## Run with the real model
|
| 40 |
|
|
@@ -46,14 +61,17 @@ pip install -r requirements-model.txt
|
|
| 46 |
RECALL_STUB=0 python server.py
|
| 47 |
```
|
| 48 |
|
| 49 |
-
> **Dependency pins (why
|
| 50 |
-
>
|
| 51 |
-
> `
|
| 52 |
-
> **gradio 6.
|
| 53 |
-
>
|
| 54 |
-
>
|
| 55 |
-
>
|
| 56 |
-
>
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
**On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
|
| 59 |
output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
|
|
@@ -65,23 +83,26 @@ RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python serv
|
|
| 65 |
|
| 66 |
## The model
|
| 67 |
|
| 68 |
-
Recall runs on **[openbmb/
|
| 69 |
|
| 70 |
-
**Where the model is load-bearing.**
|
| 71 |
- **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
|
| 72 |
- **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
|
|
|
|
| 73 |
|
| 74 |
-
**How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once
|
| 75 |
|
| 76 |
**Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.
|
| 77 |
|
| 78 |
-
**Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged:
|
| 79 |
|
| 80 |
```bash
|
|
|
|
|
|
|
| 81 |
# fast fallback
|
| 82 |
-
RECALL_MODEL=
|
| 83 |
-
# mid fallback
|
| 84 |
-
RECALL_MODEL=
|
| 85 |
```
|
| 86 |
|
| 87 |
## Project layout
|
|
@@ -91,8 +112,8 @@ RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
|
|
| 91 |
| `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
|
| 92 |
| `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
|
| 93 |
| `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
|
| 94 |
-
| `content_pipeline.py` | Frank |
|
| 95 |
-
| `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — fallback
|
| 96 |
| `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
|
| 97 |
| `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |
|
| 98 |
|
|
@@ -103,6 +124,7 @@ RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
|
|
| 103 |
3. Don't change public function signatures without telling the team.
|
| 104 |
|
| 105 |
## The judging hook
|
| 106 |
-
The small model is load-bearing in
|
| 107 |
-
answers with explanations**,
|
| 108 |
-
exact concept you missed**
|
|
|
|
|
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 6.10.0
|
| 8 |
app_file: server.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
+
tags:
|
| 12 |
+
- track:backyard
|
| 13 |
+
- sponsor:openbmb
|
| 14 |
+
- achievement:offgrid
|
| 15 |
+
- achievement:offbrand
|
| 16 |
---
|
| 17 |
|
| 18 |
# 📚 Recall — an AI study partner that gets smarter about what you get wrong
|
| 19 |
|
| 20 |
+
Upload your study material — typed notes, a PDF, even a photo or scan of a page →
|
| 21 |
+
Recall generates a quiz deck → you answer → a small model grades and explains each
|
| 22 |
+
answer → **it generates new questions targeting exactly what you missed** →
|
| 23 |
+
end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track).
|
| 24 |
|
| 25 |
+
- **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
|
| 26 |
- **Platform:** Gradio app, hosted as a Hugging Face Space
|
| 27 |
|
| 28 |
+
## Team
|
| 29 |
+
|
| 30 |
+
| Member | Hugging Face |
|
| 31 |
+
|--------|--------------|
|
| 32 |
+
| Nikolai | [@nz-nz](https://huggingface.co/nz-nz) |
|
| 33 |
+
| Frank | [@francisco-magana](https://huggingface.co/francisco-magana) |
|
| 34 |
+
| Arturo | [@arturogp3](https://huggingface.co/arturogp3) |
|
| 35 |
+
|
| 36 |
## Run it (stub mode — no GPU, no model download)
|
| 37 |
|
| 38 |
```bash
|
|
|
|
| 45 |
|
| 46 |
`server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
|
| 47 |
API over the existing backend — the learning/content logic and the `schema.py`
|
| 48 |
+
data contract are treated as an API and are never modified. It's built on
|
| 49 |
+
`gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs
|
| 50 |
+
gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds
|
| 51 |
+
port 7860 directly while the main thread is held open. The original Gradio form is
|
| 52 |
+
still available standalone via `python app.py`.
|
| 53 |
|
| 54 |
## Run with the real model
|
| 55 |
|
|
|
|
| 61 |
RECALL_STUB=0 python server.py
|
| 62 |
```
|
| 63 |
|
| 64 |
+
> **Dependency pins (why gradio is 6.10.0).** The binding constraint is the
|
| 65 |
+
> custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom
|
| 66 |
+
> `Server` breaks under a Space's runtime (app starts, process exits →
|
| 67 |
+
> `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server`
|
| 68 |
+
> reference example ships and runs cleanly. It also resolves with the real model:
|
| 69 |
+
> MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**,
|
| 70 |
+
> and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK
|
| 71 |
+
> Space force-installs one gradio for the whole Space, so stub and real-model
|
| 72 |
+
> share it without a Docker Space — keep `requirements.txt`,
|
| 73 |
+
> `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller
|
| 74 |
+
> text fallbacks add no extra constraint.
|
| 75 |
|
| 76 |
**On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
|
| 77 |
output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
|
|
|
|
| 83 |
|
| 84 |
## The model
|
| 85 |
|
| 86 |
+
Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work.
|
| 87 |
|
| 88 |
+
**Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings:
|
| 89 |
- **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
|
| 90 |
- **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
|
| 91 |
+
- **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text.
|
| 92 |
|
| 93 |
+
**How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.
|
| 94 |
|
| 95 |
**Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.
|
| 96 |
|
| 97 |
+
**Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):
|
| 98 |
|
| 99 |
```bash
|
| 100 |
+
# text fallback (8B)
|
| 101 |
+
RECALL_MODEL=8b RECALL_STUB=0 python server.py # MiniCPM4.1-8B
|
| 102 |
# fast fallback
|
| 103 |
+
RECALL_MODEL=1b RECALL_STUB=0 python server.py # MiniCPM5-1B
|
| 104 |
+
# mid fallback — ≤4B, so it qualifies for the Tiny Titan prize
|
| 105 |
+
RECALL_MODEL=4b RECALL_STUB=0 python server.py # MiniCPM3-4B
|
| 106 |
```
|
| 107 |
|
| 108 |
## Project layout
|
|
|
|
| 112 |
| `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
|
| 113 |
| `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
|
| 114 |
| `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
|
| 115 |
+
| `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. |
|
| 116 |
+
| `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). |
|
| 117 |
| `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
|
| 118 |
| `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |
|
| 119 |
|
|
|
|
| 124 |
3. Don't change public function signatures without telling the team.
|
| 125 |
|
| 126 |
## The judging hook
|
| 127 |
+
The small model is load-bearing in three visible places: **grading free-text
|
| 128 |
+
answers with explanations**, **generating follow-up questions that drill the
|
| 129 |
+
exact concept you missed**, and **reading scanned/photographed material** to build
|
| 130 |
+
the deck. Make sure the demo shows them.
|