Spaces:

build-small-hackathon
/

recall

Runtime error

App Files Files Community

Reconcile README with code: MiniCPM-V 4.6 (multimodal/OCR), fix launch + /gradio claims, correct dep pins, add submission tags + team

by nz-nz - opened 19 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+51

-29

Files changed (1) hide show

README.md +51 -29

README.md CHANGED Viewed

@@ -4,22 +4,35 @@ emoji: 📚
 colorFrom: indigo
 colorTo: green
 sdk: gradio
-sdk_version: 6.17.3
 app_file: server.py
 pinned: false
 license: mit
 ---
 # 📚 Recall — an AI study partner that gets smarter about what you get wrong
-Upload your study material → Recall generates a quiz deck → you answer → a small
-model grades and explains each answer → **it generates new questions targeting
-exactly what you missed** → end-of-session recap. Built for the **Build Small
-Hackathon** (Backyard AI track).
-- **Model:** [openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B) (fallback: MiniCPM5-1B)
 - **Platform:** Gradio app, hosted as a Hugging Face Space
 ## Run it (stub mode — no GPU, no model download)
 ```bash
@@ -32,9 +45,11 @@ the full loop in minute one.
 `server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
 API over the existing backend — the learning/content logic and the `schema.py`
-data contract are treated as an API and are never modified. The original Gradio
-form is still available as a fallback at `/gradio` (and standalone via
-`python app.py`).
 ## Run with the real model
@@ -46,14 +61,17 @@ pip install -r requirements-model.txt
 RECALL_STUB=0 python server.py
 ```
-> **Dependency pins (why they're tight).** MiniCPM4.1-8B's `trust_remote_code`
-> imports symbols removed in **transformers 5.x**, so the real model needs
-> `transformers >=4.55,<5.0`. That in turn requires `huggingface-hub <1.0`, which
-> **gradio 6.18 forbids** (it needs `hub >=1.2`) — so `requirements.txt` and the
-> Space `sdk_version` are pinned to **gradio 6.17.3** (the newest gradio that
-> still allows `hub <1.0`). Because a gradio-SDK Space force-installs one gradio
-> for the whole Space, stub and real-model share it; 6.17.3 keeps both working
-> without a Docker Space. The 1B fallback has no such constraint.
 **On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
 output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
@@ -65,23 +83,26 @@ RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python serv
 ## The model
-Recall runs on **[openbmb/MiniCPM4.1-8B](https://huggingface.co/openbmb/MiniCPM4.1-8B)**, an 8B open model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers and write grounded follow-up questions.
-**Where the model is load-bearing.** Two user-visible features are pure model work, not templated strings:
 - **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
 - **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
-**How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once (lazily, via `AutoModelForCausalLM` in `bf16` with `device_map="auto"`) on the Space's ZeroGPU, with the GPU entrypoint wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.
 **Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.
-**Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged:
 ```bash
 # fast fallback
-RECALL_MODEL=openbmb/MiniCPM5-1B RECALL_STUB=0 python app.py
-# mid fallback (also earns the Tiny Titan badge)
-RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
 ```
 ## Project layout
@@ -91,8 +112,8 @@ RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
 | `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
 | `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
 | `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
-| `content_pipeline.py` | Frank | PDF/text → chunks → question cards. |
-| `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — fallback at `/gradio`. |
 | `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
 | `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |
@@ -103,6 +124,7 @@ RECALL_MODEL=openbmb/MiniCPM3-4B RECALL_STUB=0 python app.py
 3. Don't change public function signatures without telling the team.
 ## The judging hook
-The small model is load-bearing in two visible places: **grading free-text
-answers with explanations**, and **generating follow-up questions that drill the
-exact concept you missed**. Make sure the demo shows both.

 colorFrom: indigo
 colorTo: green
 sdk: gradio
+sdk_version: 6.10.0
 app_file: server.py
 pinned: false
 license: mit
+tags:
+- track:backyard
+- sponsor:openbmb
+- achievement:offgrid
+- achievement:offbrand
 ---
 # 📚 Recall — an AI study partner that gets smarter about what you get wrong
+Upload your study material — typed notes, a PDF, even a photo or scan of a page →
+Recall generates a quiz deck → you answer → a small model grades and explains each
+answer → **it generates new questions targeting exactly what you missed** →
+end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track).
+- **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
 - **Platform:** Gradio app, hosted as a Hugging Face Space
+## Team
+| Member | Hugging Face |
+|--------|--------------|
+| Nikolai | [@nz-nz](https://huggingface.co/nz-nz) |
+| Frank | [@francisco-magana](https://huggingface.co/francisco-magana) |
+| Arturo | [@arturogp3](https://huggingface.co/arturogp3) |
 ## Run it (stub mode — no GPU, no model download)
 ```bash
 `server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
 API over the existing backend — the learning/content logic and the `schema.py`
+data contract are treated as an API and are never modified. It's built on
+`gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs
+gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds
+port 7860 directly while the main thread is held open. The original Gradio form is
+still available standalone via `python app.py`.
 ## Run with the real model
 RECALL_STUB=0 python server.py
 ```
+> **Dependency pins (why gradio is 6.10.0).** The binding constraint is the
+> custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom
+> `Server` breaks under a Space's runtime (app starts, process exits →
+> `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server`
+> reference example ships and runs cleanly. It also resolves with the real model:
+> MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**,
+> and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK
+> Space force-installs one gradio for the whole Space, so stub and real-model
+> share it without a Docker Space — keep `requirements.txt`,
+> `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller
+> text fallbacks add no extra constraint.
 **On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
 output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
 ## The model
+Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work.
+**Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings:
 - **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
 - **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
+- **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text.
+**How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.
 **Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.
+**Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):
 ```bash
+# text fallback (8B)
+RECALL_MODEL=8b RECALL_STUB=0 python server.py   # MiniCPM4.1-8B
 # fast fallback
+RECALL_MODEL=1b RECALL_STUB=0 python server.py   # MiniCPM5-1B
+# mid fallback — ≤4B, so it qualifies for the Tiny Titan prize
+RECALL_MODEL=4b RECALL_STUB=0 python server.py   # MiniCPM3-4B
 ```
 ## Project layout
 | `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
 | `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
 | `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
+| `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. |
+| `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). |
 | `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
 | `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |
 3. Don't change public function signatures without telling the team.
 ## The judging hook
+The small model is load-bearing in three visible places: **grading free-text
+answers with explanations**, **generating follow-up questions that drill the
+exact concept you missed**, and **reading scanned/photographed material** to build
+the deck. Make sure the demo shows them.