Spaces:
Running on Zero
Running on Zero
| title: Recall — AI Study Partner | |
| emoji: 📚 | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.10.0 | |
| app_file: server.py | |
| pinned: false | |
| license: mit | |
| tags: | |
| - track:backyard | |
| - sponsor:openbmb | |
| - achievement:offgrid | |
| - achievement:offbrand | |
| # 📚 Recall — an AI study partner that gets smarter about what you get wrong | |
| Upload your study material — typed notes, a PDF, even a photo or scan of a page → | |
| Recall generates a quiz deck → you answer → a small model grades and explains each | |
| answer → **it generates new questions targeting exactly what you missed** → | |
| end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track). | |
| - **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B. | |
| - **Platform:** Gradio app, hosted as a Hugging Face Space | |
| - **Demo video:** [YouTube](https://youtube.com/shorts/8_EfO4Pmhyg) | |
| - **Social post:** [LinkedIn](https://www.linkedin.com/posts/francisco-javier-magana-palomeque_were-building-recall-a-learning-tool-that-ugcPost-7472392761250488320-_ngD/) | |
| ## Team | |
| | Member | Hugging Face | | |
| |--------|--------------| | |
| | Nikolai | [@nz-nz](https://huggingface.co/nz-nz) | | |
| | Frank | [@francisco-magana](https://huggingface.co/francisco-magana) | | |
| | Arturo | [@arturogp3](https://huggingface.co/arturogp3) | | |
| ## Run it (stub mode — no GPU, no model download) | |
| ```bash | |
| pip install -r requirements.txt | |
| python server.py # http://127.0.0.1:7860 ← polished custom frontend | |
| ``` | |
| Everything works end-to-end on canned data, so anyone can clone and click through | |
| the full loop in minute one. | |
| `server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON | |
| API over the existing backend — the learning/content logic and the `schema.py` | |
| data contract are treated as an API and are never modified. It's built on | |
| `gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs | |
| gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds | |
| port 7860 directly while the main thread is held open. The original Gradio form is | |
| still available standalone via `python app.py`. | |
| ## Run with the real model | |
| The heavy model deps (torch/transformers/…) are kept out of `requirements.txt` so | |
| the Space build stays fast in stub mode. Install them with the model requirements: | |
| ```bash | |
| pip install -r requirements-model.txt | |
| RECALL_STUB=0 python server.py | |
| ``` | |
| > **Dependency pins (why gradio is 6.10.0).** The binding constraint is the | |
| > custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom | |
| > `Server` breaks under a Space's runtime (app starts, process exits → | |
| > `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server` | |
| > reference example ships and runs cleanly. It also resolves with the real model: | |
| > MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**, | |
| > and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK | |
| > Space force-installs one gradio for the whole Space, so stub and real-model | |
| > share it without a Docker Space — keep `requirements.txt`, | |
| > `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller | |
| > text fallbacks add no extra constraint. | |
| **On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage | |
| output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For | |
| a clean local real-model smoke test, force CPU/float32: | |
| ```bash | |
| RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python server.py | |
| ``` | |
| ## The model | |
| Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work. | |
| **Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings: | |
| - **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed. | |
| - **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong. | |
| - **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text. | |
| **How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop. | |
| **Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model. | |
| **Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path): | |
| ```bash | |
| # text fallback (8B) | |
| RECALL_MODEL=8b RECALL_STUB=0 python server.py # MiniCPM4.1-8B | |
| # fast fallback | |
| RECALL_MODEL=1b RECALL_STUB=0 python server.py # MiniCPM5-1B | |
| # mid fallback — ≤4B, so it qualifies for the Tiny Titan prize | |
| RECALL_MODEL=4b RECALL_STUB=0 python server.py # MiniCPM3-4B | |
| ``` | |
| ## Project layout | |
| | File | Owner | What it is | | |
| |------|-------|-----------| | |
| | `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. | | |
| | `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. | | |
| | `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. | | |
| | `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. | | |
| | `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). | | |
| | `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. | | |
| | `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. | | |
| ## How to work in parallel | |
| 1. At kickoff, lock `schema.py` together. | |
| 2. Each module already ships **working stubs** — build your real logic behind the | |
| same function signatures, flip `RECALL_STUB=0` to test for real. | |
| 3. Don't change public function signatures without telling the team. | |
| ## The judging hook | |
| The small model is load-bearing in three visible places: **grading free-text | |
| answers with explanations**, **generating follow-up questions that drill the | |
| exact concept you missed**, and **reading scanned/photographed material** to build | |
| the deck. Make sure the demo shows them. | |