Spaces:

build-small-hackathon
/

study-partner

Running on Zero

File size: 7,658 Bytes

---
title: Recall — AI Study Partner
emoji: 📚
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.10.0
app_file: server.py
pinned: false
license: mit
tags:
- track:backyard
- sponsor:openbmb
- achievement:offgrid
- achievement:offbrand
---

# 📚 Recall — an AI study partner that gets smarter about what you get wrong

Upload your study material — typed notes, a PDF, even a photo or scan of a page →
Recall generates a quiz deck → you answer → a small model grades and explains each
answer → **it generates new questions targeting exactly what you missed** →
end-of-session recap. Built for the **Build Small Hackathon** (Backyard AI track).

- **Model:** [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text **and** reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
- **Platform:** Gradio app, hosted as a Hugging Face Space
- **Demo video:** [YouTube](https://youtube.com/shorts/8_EfO4Pmhyg)
- **Social post:** [LinkedIn](https://www.linkedin.com/posts/francisco-javier-magana-palomeque_were-building-recall-a-learning-tool-that-ugcPost-7472392761250488320-_ngD/)

## Team

| Member | Hugging Face |
|--------|--------------|
| Nikolai | [@nz-nz](https://huggingface.co/nz-nz) |
| Frank | [@francisco-magana](https://huggingface.co/francisco-magana) |
| Arturo | [@arturogp3](https://huggingface.co/arturogp3) |

## Run it (stub mode — no GPU, no model download)

```bash
pip install -r requirements.txt
python server.py         # http://127.0.0.1:7860  ← polished custom frontend
```

Everything works end-to-end on canned data, so anyone can clone and click through
the full loop in minute one.

`server.py` serves the **Recall** design (`frontend/index.html`) and a thin JSON
API over the existing backend — the learning/content logic and the `schema.py`
data contract are treated as an API and are never modified. It's built on
`gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs
gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds
port 7860 directly while the main thread is held open. The original Gradio form is
still available standalone via `python app.py`.

## Run with the real model

The heavy model deps (torch/transformers/…) are kept out of `requirements.txt` so
the Space build stays fast in stub mode. Install them with the model requirements:

```bash
pip install -r requirements-model.txt
RECALL_STUB=0 python server.py
```

> **Dependency pins (why gradio is 6.10.0).** The binding constraint is the
> custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom
> `Server` breaks under a Space's runtime (app starts, process exits →
> `RUNTIME_ERROR`). **gradio 6.10.0** is the version gradio's own ZeroGPU `Server`
> reference example ships and runs cleanly. It also resolves with the real model:
> MiniCPM-V 4.6 runs on **transformers 5.x**, which wants **huggingface-hub 1.x**,
> and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK
> Space force-installs one gradio for the whole Space, so stub and real-model
> share it without a Docker Space — keep `requirements.txt`,
> `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller
> text fallbacks add no extra constraint.

**On Apple Silicon (M1/M2/…),** the default bf16 + MPS combo produces garbage
output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
a clean local real-model smoke test, force CPU/float32:

```bash
RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python server.py
```

## The model

Recall runs on **[openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6)**, an open **multimodal** model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, **and read scanned or photographed material directly**. One model does both the text and the vision work.

**Where the model is load-bearing.** Three user-visible features are pure model work, not templated strings:
- **Grading** — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
- **Adaptive follow-ups** — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
- **Vision / OCR** — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text.

**How inference is served.** Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.

**Stub mode.** With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.

**Fallback (config flip, no code change).** If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):

```bash
# text fallback (8B)
RECALL_MODEL=8b RECALL_STUB=0 python server.py   # MiniCPM4.1-8B
# fast fallback
RECALL_MODEL=1b RECALL_STUB=0 python server.py   # MiniCPM5-1B
# mid fallback — ≤4B, so it qualifies for the Tiny Titan prize
RECALL_MODEL=4b RECALL_STUB=0 python server.py   # MiniCPM3-4B
```

## Project layout

| File | Owner | What it is |
|------|-------|-----------|
| `schema.py` | shared | The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. |
| `llm.py` | Nikolai | Shared MiniCPM inference wrapper + defensive JSON parsing. |
| `learning_engine.py` | Nikolai | Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. |
| `content_pipeline.py` | Frank | Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. |
| `app.py` | Arturo | Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). |
| `server.py` | — | FastAPI server: serves the custom frontend + JSON API over the backend. |
| `frontend/index.html` | — | The polished **Recall** design (Upload / Study / Recap), vanilla HTML/CSS/JS. |

## How to work in parallel
1. At kickoff, lock `schema.py` together.
2. Each module already ships **working stubs** — build your real logic behind the
   same function signatures, flip `RECALL_STUB=0` to test for real.
3. Don't change public function signatures without telling the team.

## The judging hook
The small model is load-bearing in three visible places: **grading free-text
answers with explanations**, **generating follow-up questions that drill the
exact concept you missed**, and **reading scanned/photographed material** to build
the deck. Make sure the demo shows them.