File size: 7,658 Bytes
9817fce
7563305
 
 
 
9817fce
efca112
7563305
9817fce
 
35c32e6
 
 
 
 
9817fce
 
7563305
 
35c32e6
 
 
 
7563305
35c32e6
7563305
5930af9
 
7563305
35c32e6
 
 
 
 
 
 
 
7563305
 
 
 
 
 
 
 
 
 
 
 
2220375
 
35c32e6
 
 
7563305
 
 
 
 
 
 
 
 
 
 
35c32e6
 
 
efca112
35c32e6
 
 
 
 
 
 
7563305
 
 
 
 
 
 
 
 
 
 
35c32e6
7563305
35c32e6
7563305
 
35c32e6
7563305
35c32e6
7563305
 
 
35c32e6
7563305
 
35c32e6
 
7563305
35c32e6
 
 
7563305
 
 
 
 
 
 
 
 
35c32e6
 
7563305
 
 
 
 
 
 
 
 
 
35c32e6
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: Recall  AI Study Partner
emoji: 📚
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.10.0
app_file: server.py
pinned: false
license: mit
tags:
- track:backyard
- sponsor:openbmb
- achievement:offgrid
- achievement:offbrand
---

# 📚 Recall — an AI study partner that gets smarter about what you get wrong

Upload your study material — typed notes, a PDF, even a photo or scan of a page →
Recall generates a quiz deck → you answer → a small model grades and explains each
answer → **it generates new questions targeting exactly what you missed** →
end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track).

- **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
- **Platform:** Gradio app, hosted as a Hugging Face Space
- **Demo video:** [YouTube](https://youtube.com/shorts/8_EfO4Pmhyg)
- **Social post:** [LinkedIn](https://www.linkedin.com/posts/francisco-javier-magana-palomeque_were-building-recall-a-learning-tool-that-ugcPost-7472392761250488320-_ngD/)

## Team

| Member | Hugging Face |
|--------|--------------|
| Nikolai | [@nz-nz](https://huggingface.co/nz-nz) |
| Frank | [@francisco-magana](https://huggingface.co/francisco-magana) |
| Arturo | [@arturogp3](https://huggingface.co/arturogp3) |

## Run it (stub mode — no GPU, no model download)

```bash
pip install -r requirements.txt
python server.py         # http://127.0.0.1:7860  ← polished custom frontend
```

Everything works end-to-end on canned data, so anyone can clone and click through
the full loop in minute one.

`server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
API over the existing backend — the learning/content logic and the `schema.py`
data contract are treated as an API and are never modified. It's built on
`gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs
gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds
port 7860 directly while the main thread is held open. The original Gradio form is
still available standalone via `python app.py`.

## Run with the real model

The heavy model deps (torch/transformers/…) are kept out of `requirements.txt` so
the Space build stays fast in stub mode. Install them with the model requirements:

```bash
pip install -r requirements-model.txt
RECALL_STUB=0 python server.py
```

> **Dependency pins (why gradio is 6.10.0).** The binding constraint is the
> custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom
> `Server` breaks under a Space's runtime (app starts, process exits →
> `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server`
> reference example ships and runs cleanly. It also resolves with the real model:
> MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**,
> and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK
> Space force-installs one gradio for the whole Space, so stub and real-model
> share it without a Docker Space — keep `requirements.txt`,
> `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller
> text fallbacks add no extra constraint.

**On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
a clean local real-model smoke test, force CPU/float32:

```bash
RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python server.py
```

## The model

Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work.

**Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings:
- **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
- **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
- **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text.

**How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.

**Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.

**Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):

```bash
# text fallback (8B)
RECALL_MODEL=8b RECALL_STUB=0 python server.py   # MiniCPM4.1-8B
# fast fallback
RECALL_MODEL=1b RECALL_STUB=0 python server.py   # MiniCPM5-1B
# mid fallback — ≤4B, so it qualifies for the Tiny Titan prize
RECALL_MODEL=4b RECALL_STUB=0 python server.py   # MiniCPM3-4B
```

## Project layout

| File | Owner | What it is |
|------|-------|-----------|
| `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
| `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
| `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
| `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. |
| `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). |
| `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
| `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |

## How to work in parallel
1. At kickoff, lock `schema.py` together.
2. Each module already ships **working stubs** — build your real logic behind the
   same function signatures, flip `RECALL_STUB=0` to test for real.
3. Don't change public function signatures without telling the team.

## The judging hook
The small model is load-bearing in three visible places: **grading free-text
answers with explanations**, **generating follow-up questions that drill the
exact concept you missed**, and **reading scanned/photographed material** to build
the deck. Make sure the demo shows them.