--- title: Recall — AI Study Partner emoji: 📚 colorFrom: indigo colorTo: green sdk: gradio sdk_version: 6.10.0 app_file: server.py pinned: false license: mit tags: - track:backyard - sponsor:openbmb - achievement:offgrid - achievement:offbrand --- # 📚 Recall — an AI study partner that gets smarter about what you get wrong Upload your study material — typed notes, a PDF, even a photo or scan of a page → Recall generates a quiz deck → you answer → a small model grades and explains each answer → **it generates new questions targeting exactly what you missed** → end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track). - **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B. - **Platform:** Gradio app, hosted as a Hugging Face Space - **Demo video:** [YouTube](https://youtube.com/shorts/8_EfO4Pmhyg) - **Social post:** [LinkedIn](https://www.linkedin.com/posts/francisco-javier-magana-palomeque_were-building-recall-a-learning-tool-that-ugcPost-7472392761250488320-_ngD/) ## Team | Member | Hugging Face | |--------|--------------| | Nikolai | [@nz-nz](https://huggingface.co/nz-nz) | | Frank | [@francisco-magana](https://huggingface.co/francisco-magana) | | Arturo | [@arturogp3](https://huggingface.co/arturogp3) | ## Run it (stub mode — no GPU, no model download) ```bash pip install -r requirements.txt python server.py # http://127.0.0.1:7860 ← polished custom frontend ``` Everything works end-to-end on canned data, so anyone can clone and click through the full loop in minute one. `server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON API over the existing backend — the learning/content logic and the `schema.py` data contract are treated as an API and are never modified. It's built on `gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds port 7860 directly while the main thread is held open. The original Gradio form is still available standalone via `python app.py`. ## Run with the real model The heavy model deps (torch/transformers/…) are kept out of `requirements.txt` so the Space build stays fast in stub mode. Install them with the model requirements: ```bash pip install -r requirements-model.txt RECALL_STUB=0 python server.py ``` > **Dependency pins (why gradio is 6.10.0).** The binding constraint is the > custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom > `Server` breaks under a Space's runtime (app starts, process exits → > `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server` > reference example ships and runs cleanly. It also resolves with the real model: > MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**, > and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK > Space force-installs one gradio for the whole Space, so stub and real-model > share it without a Docker Space — keep `requirements.txt`, > `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller > text fallbacks add no extra constraint. **On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For a clean local real-model smoke test, force CPU/float32: ```bash RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python server.py ``` ## The model Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work. **Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings: - **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed. - **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong. - **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text. **How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop. **Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model. **Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path): ```bash # text fallback (8B) RECALL_MODEL=8b RECALL_STUB=0 python server.py # MiniCPM4.1-8B # fast fallback RECALL_MODEL=1b RECALL_STUB=0 python server.py # MiniCPM5-1B # mid fallback — ≤4B, so it qualifies for the Tiny Titan prize RECALL_MODEL=4b RECALL_STUB=0 python server.py # MiniCPM3-4B ``` ## Project layout | File | Owner | What it is | |------|-------|-----------| | `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. | | `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. | | `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. | | `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. | | `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). | | `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. | | `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. | ## How to work in parallel 1. At kickoff, lock `schema.py` together. 2. Each module already ships **working stubs** — build your real logic behind the same function signatures, flip `RECALL_STUB=0` to test for real. 3. Don't change public function signatures without telling the team. ## The judging hook The small model is load-bearing in three visible places: **grading free-text answers with explanations**, **generating follow-up questions that drill the exact concept you missed**, and **reading scanned/photographed material** to build the deck. Make sure the demo shows them.