Spaces:
Running on Zero
Running on Zero
File size: 7,658 Bytes
9817fce 7563305 9817fce efca112 7563305 9817fce 35c32e6 9817fce 7563305 35c32e6 7563305 35c32e6 7563305 5930af9 7563305 35c32e6 7563305 2220375 35c32e6 7563305 35c32e6 efca112 35c32e6 7563305 35c32e6 7563305 35c32e6 7563305 35c32e6 7563305 35c32e6 7563305 35c32e6 7563305 35c32e6 7563305 35c32e6 7563305 35c32e6 7563305 35c32e6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | ---
title: Recall — AI Study Partner
emoji: 📚
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.10.0
app_file: server.py
pinned: false
license: mit
tags:
- track:backyard
- sponsor:openbmb
- achievement:offgrid
- achievement:offbrand
---
# 📚 Recall — an AI study partner that gets smarter about what you get wrong
Upload your study material — typed notes, a PDF, even a photo or scan of a page →
Recall generates a quiz deck → you answer → a small model grades and explains each
answer → **it generates new questions targeting exactly what you missed** →
end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track).
- **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
- **Platform:** Gradio app, hosted as a Hugging Face Space
- **Demo video:** [YouTube](https://youtube.com/shorts/8_EfO4Pmhyg)
- **Social post:** [LinkedIn](https://www.linkedin.com/posts/francisco-javier-magana-palomeque_were-building-recall-a-learning-tool-that-ugcPost-7472392761250488320-_ngD/)
## Team
| Member | Hugging Face |
|--------|--------------|
| Nikolai | [@nz-nz](https://huggingface.co/nz-nz) |
| Frank | [@francisco-magana](https://huggingface.co/francisco-magana) |
| Arturo | [@arturogp3](https://huggingface.co/arturogp3) |
## Run it (stub mode — no GPU, no model download)
```bash
pip install -r requirements.txt
python server.py # http://127.0.0.1:7860 ← polished custom frontend
```
Everything works end-to-end on canned data, so anyone can clone and click through
the full loop in minute one.
`server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
API over the existing backend — the learning/content logic and the `schema.py`
data contract are treated as an API and are never modified. It's built on
`gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs
gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds
port 7860 directly while the main thread is held open. The original Gradio form is
still available standalone via `python app.py`.
## Run with the real model
The heavy model deps (torch/transformers/…) are kept out of `requirements.txt` so
the Space build stays fast in stub mode. Install them with the model requirements:
```bash
pip install -r requirements-model.txt
RECALL_STUB=0 python server.py
```
> **Dependency pins (why gradio is 6.10.0).** The binding constraint is the
> custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom
> `Server` breaks under a Space's runtime (app starts, process exits →
> `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server`
> reference example ships and runs cleanly. It also resolves with the real model:
> MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**,
> and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK
> Space force-installs one gradio for the whole Space, so stub and real-model
> share it without a Docker Space — keep `requirements.txt`,
> `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller
> text fallbacks add no extra constraint.
**On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
a clean local real-model smoke test, force CPU/float32:
```bash
RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python server.py
```
## The model
Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work.
**Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings:
- **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
- **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
- **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text.
**How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.
**Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.
**Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):
```bash
# text fallback (8B)
RECALL_MODEL=8b RECALL_STUB=0 python server.py # MiniCPM4.1-8B
# fast fallback
RECALL_MODEL=1b RECALL_STUB=0 python server.py # MiniCPM5-1B
# mid fallback — ≤4B, so it qualifies for the Tiny Titan prize
RECALL_MODEL=4b RECALL_STUB=0 python server.py # MiniCPM3-4B
```
## Project layout
| File | Owner | What it is |
|------|-------|-----------|
| `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
| `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
| `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
| `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. |
| `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). |
| `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
| `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |
## How to work in parallel
1. At kickoff, lock `schema.py` together.
2. Each module already ships **working stubs** — build your real logic behind the
same function signatures, flip `RECALL_STUB=0` to test for real.
3. Don't change public function signatures without telling the team.
## The judging hook
The small model is load-bearing in three visible places: **grading free-text
answers with explanations**, **generating follow-up questions that drill the
exact concept you missed**, and **reading scanned/photographed material** to build
the deck. Make sure the demo shows them.
|