Reconcile README with code: MiniCPM-V 4.6 (multimodal/OCR), fix launch + /gradio claims, correct dep pins, add submission tags + team

#1
by nz-nz - opened
Files changed (1) hide show
  1. README.md +51 -29
README.md CHANGED
@@ -4,22 +4,35 @@ emoji: 📚
4
  colorFrom: indigo
5
  colorTo: green
6
  sdk: gradio
7
- sdk_version: 6.17.3
8
  app_file: server.py
9
  pinned: false
10
  license: mit
 
 
 
 
 
11
  ---
12
 
13
  # 📚 Recall — an AI study partner that gets smarter about what you get wrong
14
 
15
- Upload your study material Recall generates a quiz deck you answer a small
16
- model grades and explains each answer → **it generates new questions targeting
17
- exactly what you missed** end-of-session recap. Built for the **Build Small
18
- Hackathon** (Backyard AI track).
19
 
20
- - **Model:** [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) (fallback: MiniCPM5-1B)
21
  - **Platform:** Gradio app, hosted as a Hugging Face Space
22
 
 
 
 
 
 
 
 
 
23
  ## Run it (stub mode — no GPU, no model download)
24
 
25
  ```bash
@@ -32,9 +45,11 @@ the full loop in minute one.
32
 
33
  `server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
34
  API over the existing backend — the learning/content logic and the `schema.py`
35
- data contract are treated as an API and are never modified. The original Gradio
36
- form is still available as a fallback at `/gradio` (and standalone via
37
- `python app.py`).
 
 
38
 
39
  ## Run with the real model
40
 
@@ -46,14 +61,17 @@ pip install -r requirements-model.txt
46
  RECALL_STUB=0 python server.py
47
  ```
48
 
49
- > **Dependency pins (why they're tight).** MiniCPM4.1-8B's `trust_remote_code`
50
- > imports symbols removed in **transformers 5.x**, so the real model needs
51
- > `transformers >=4.55,<5.0`. That in turn requires `huggingface-hub <1.0`, which
52
- > **gradio 6.18 forbids** (it needs `hub >=1.2`) so `requirements.txt` and the
53
- > Space `sdk_version` are pinned to **gradio 6.17.3** (the newest gradio that
54
- > still allows `hub <1.0`). Because a gradio-SDK Space force-installs one gradio
55
- > for the whole Space, stub and real-model share it; 6.17.3 keeps both working
56
- > without a Docker Space. The 1B fallback has no such constraint.
 
 
 
57
 
58
  **On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
59
  output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
@@ -65,23 +83,26 @@ RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python serv
65
 
66
  ## The model
67
 
68
- Recall runs on **[openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B)**, an 8B open model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers and write grounded follow-up questions.
69
 
70
- **Where the model is load-bearing.** Two user-visible features are pure model work, not templated strings:
71
  - **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
72
  - **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
 
73
 
74
- **How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once (lazily, via `AutoModelForCausalLM` in `bf16` with `device_map="auto"`) on the Space's ZeroGPU, with the GPU entrypoint wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.
75
 
76
  **Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.
77
 
78
- **Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged:
79
 
80
  ```bash
 
 
81
  # fast fallback
82
- RECALL_MODEL=openbmb/MiniCPM5-1B RECALL_STUB=0 python app.py
83
- # mid fallback (also earns the Tiny Titan badge)
84
- RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
85
  ```
86
 
87
  ## Project layout
@@ -91,8 +112,8 @@ RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
91
  | `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
92
  | `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
93
  | `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
94
- | `content_pipeline.py` | Frank | PDF/text → chunks → question cards. |
95
- | `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — fallback at `/gradio`. |
96
  | `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
97
  | `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |
98
 
@@ -103,6 +124,7 @@ RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
103
  3. Don't change public function signatures without telling the team.
104
 
105
  ## The judging hook
106
- The small model is load-bearing in two visible places: **grading free-text
107
- answers with explanations**, and **generating follow-up questions that drill the
108
- exact concept you missed**. Make sure the demo shows both.
 
 
4
  colorFrom: indigo
5
  colorTo: green
6
  sdk: gradio
7
+ sdk_version: 6.10.0
8
  app_file: server.py
9
  pinned: false
10
  license: mit
11
+ tags:
12
+ - track:backyard
13
+ - sponsor:openbmb
14
+ - achievement:offgrid
15
+ - achievement:offbrand
16
  ---
17
 
18
  # 📚 Recall — an AI study partner that gets smarter about what you get wrong
19
 
20
+ Upload your study material typed notes, a PDF, even a photo or scan of a page →
21
+ Recall generates a quiz deck → you answer → a small model grades and explains each
22
+ answer **it generates new questions targeting exactly what you missed**
23
+ end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track).
24
 
25
+ - **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
26
  - **Platform:** Gradio app, hosted as a Hugging Face Space
27
 
28
+ ## Team
29
+
30
+ | Member | Hugging Face |
31
+ |--------|--------------|
32
+ | Nikolai | [@nz-nz](https://huggingface.co/nz-nz) |
33
+ | Frank | [@francisco-magana](https://huggingface.co/francisco-magana) |
34
+ | Arturo | [@arturogp3](https://huggingface.co/arturogp3) |
35
+
36
  ## Run it (stub mode — no GPU, no model download)
37
 
38
  ```bash
 
45
 
46
  `server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
47
  API over the existing backend — the learning/content logic and the `schema.py`
48
+ data contract are treated as an API and are never modified. It's built on
49
+ `gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs
50
+ gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds
51
+ port 7860 directly while the main thread is held open. The original Gradio form is
52
+ still available standalone via `python app.py`.
53
 
54
  ## Run with the real model
55
 
 
61
  RECALL_STUB=0 python server.py
62
  ```
63
 
64
+ > **Dependency pins (why gradio is 6.10.0).** The binding constraint is the
65
+ > custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom
66
+ > `Server` breaks under a Space's runtime (app starts, process exits →
67
+ > `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server`
68
+ > reference example ships and runs cleanly. It also resolves with the real model:
69
+ > MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**,
70
+ > and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK
71
+ > Space force-installs one gradio for the whole Space, so stub and real-model
72
+ > share it without a Docker Space — keep `requirements.txt`,
73
+ > `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller
74
+ > text fallbacks add no extra constraint.
75
 
76
  **On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
77
  output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
 
83
 
84
  ## The model
85
 
86
+ Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work.
87
 
88
+ **Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings:
89
  - **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
90
  - **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
91
+ - **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text.
92
 
93
+ **How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.
94
 
95
  **Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.
96
 
97
+ **Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):
98
 
99
  ```bash
100
+ # text fallback (8B)
101
+ RECALL_MODEL=8b RECALL_STUB=0 python server.py # MiniCPM4.1-8B
102
  # fast fallback
103
+ RECALL_MODEL=1b RECALL_STUB=0 python server.py # MiniCPM5-1B
104
+ # mid fallback ≤4B, so it qualifies for the Tiny Titan prize
105
+ RECALL_MODEL=4b RECALL_STUB=0 python server.py # MiniCPM3-4B
106
  ```
107
 
108
  ## Project layout
 
112
  | `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
113
  | `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
114
  | `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
115
+ | `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. |
116
+ | `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). |
117
  | `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
118
  | `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |
119
 
 
124
  3. Don't change public function signatures without telling the team.
125
 
126
  ## The judging hook
127
+ The small model is load-bearing in three visible places: **grading free-text
128
+ answers with explanations**, **generating follow-up questions that drill the
129
+ exact concept you missed**, and **reading scanned/photographed material** to build
130
+ the deck. Make sure the demo shows them.