Spaces:

build-small-hackathon
/

study-partner

Running on Zero

App Files Files Community

study-partner / README.md

nz-nz

Sync from GitHub via hub-sync

5930af9 verified 15 days ago

preview code

Raw

History Blame Contribute Delete

7.66 kB

	---
	title: Recall — AI Study Partner
	emoji: 📚
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: 6.10.0
	app_file: server.py
	pinned: false
	license: mit
	tags:
	- track:backyard
	- sponsor:openbmb
	- achievement:offgrid
	- achievement:offbrand
	---

	# 📚 Recall — an AI study partner that gets smarter about what you get wrong

	Upload your study material — typed notes, a PDF, even a photo or scan of a page →
	Recall generates a quiz deck → you answer → a small model grades and explains each
	answer → it generates new questions targeting exactly what you missed →
	end-of-session recap. Built for the Build Small Hackathon (Backyard AI track).

	- Model: [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) — multimodal (grades text and reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
	- Platform: Gradio app, hosted as a Hugging Face Space
	- Demo video: [YouTube](https://youtube.com/shorts/8_EfO4Pmhyg)
	- Social post: [LinkedIn](https://www.linkedin.com/posts/francisco-javier-magana-palomeque_were-building-recall-a-learning-tool-that-ugcPost-7472392761250488320-_ngD/)

	## Team

	\| Member \| Hugging Face \|
	\|--------\|--------------\|
	\| Nikolai \| [@nz-nz](https://huggingface.co/nz-nz) \|
	\| Frank \| [@francisco-magana](https://huggingface.co/francisco-magana) \|
	\| Arturo \| [@arturogp3](https://huggingface.co/arturogp3) \|

	## Run it (stub mode — no GPU, no model download)

	```bash
	pip install -r requirements.txt
	python server.py # http://127.0.0.1:7860 ← polished custom frontend
	```

	Everything works end-to-end on canned data, so anyone can clone and click through
	the full loop in minute one.

	`server.py` serves the Recall design (`frontend/index.html`) and a thin JSON
	API over the existing backend — the learning/content logic and the `schema.py`
	data contract are treated as an API and are never modified. It's built on
	`gradio.Server` (a FastAPI subclass), so the same gradio-SDK Space that installs
	gradio also runs the custom frontend; `app.launch(prevent_thread_lock=True)` binds
	port 7860 directly while the main thread is held open. The original Gradio form is
	still available standalone via `python app.py`.

	## Run with the real model

	The heavy model deps (torch/transformers/…) are kept out of `requirements.txt` so
	the Space build stays fast in stub mode. Install them with the model requirements:

	```bash
	pip install -r requirements-model.txt
	RECALL_STUB=0 python server.py
	```

	> Dependency pins (why gradio is 6.10.0). The binding constraint is the
	> custom-frontend server: it uses `gradio.Server`, and on gradio 6.17.x a custom
	> `Server` breaks under a Space's runtime (app starts, process exits →
	> `RUNTIME_ERROR`). gradio 6.10.0 is the version gradio's own ZeroGPU `Server`
	> reference example ships and runs cleanly. It also resolves with the real model:
	> MiniCPM-V 4.6 runs on transformers 5.x, which wants huggingface-hub 1.x,
	> and 6.10.0 allows `huggingface-hub <2.0,>=0.33.5` (i.e. hub 1.x). A gradio-SDK
	> Space force-installs one gradio for the whole Space, so stub and real-model
	> share it without a Docker Space — keep `requirements.txt`,
	> `requirements-model.txt` and the Space `sdk_version` in lockstep. The smaller
	> text fallbacks add no extra constraint.

	On Apple Silicon (M1/M2/…), the default bf16 + MPS combo produces garbage
	output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For
	a clean local real-model smoke test, force CPU/float32:

	```bash
	RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python server.py
	```

	## The model

	Recall runs on [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6), an open multimodal model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, and read scanned or photographed material directly. One model does both the text and the vision work.

	Where the model is load-bearing. Three user-visible features are pure model work, not templated strings:
	- Grading — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
	- Adaptive follow-ups — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
	- Vision / OCR — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (`content_pipeline.py`), so slide photos and scans work, not just digital text.

	How inference is served. Everything model-related goes through a single `chat(messages, max_tokens)` wrapper in `llm.py`; no other module imports `transformers` directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via `MiniCPMV4_6ForConditionalGeneration` + an `AutoProcessor`, the text-only fallbacks via `AutoModelForCausalLM` + `AutoTokenizer` — in `bf16` with `device_map="auto"`, and the GPU entrypoint is wrapped in `@spaces.GPU`. `max_tokens` is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.

	Stub mode. With `RECALL_STUB=1` (the default) `chat()` returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip `RECALL_STUB=0` to use the real model.

	Fallback (config flip, no code change). If the Space is too slow or runs out of memory, swap to a smaller model by setting `RECALL_MODEL` — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):

	```bash
	# text fallback (8B)
	RECALL_MODEL=8b RECALL_STUB=0 python server.py # MiniCPM4.1-8B
	# fast fallback
	RECALL_MODEL=1b RECALL_STUB=0 python server.py # MiniCPM5-1B
	# mid fallback — ≤4B, so it qualifies for the Tiny Titan prize
	RECALL_MODEL=4b RECALL_STUB=0 python server.py # MiniCPM3-4B
	```

	## Project layout

	\| File \| Owner \| What it is \|
	\|------\|-------\|-----------\|
	\| `schema.py` \| shared \| The data contract (`Card`, `CardState`, `GradeResult`, `Session`). Don't change without a sync. \|
	\| `llm.py` \| Nikolai \| Shared MiniCPM inference wrapper + defensive JSON parsing. \|
	\| `learning_engine.py` \| Nikolai \| Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap. \|
	\| `content_pipeline.py` \| Frank \| Text & image PDFs → chunks (scans render to page images for the vision model) → question cards. \|
	\| `app.py` \| Arturo \| Gradio UI (Upload / Study / Recap) over `gr.State` — standalone fallback (`python app.py`). \|
	\| `server.py` \| — \| FastAPI server: serves the custom frontend + JSON API over the backend. \|
	\| `frontend/index.html` \| — \| The polished Recall design (Upload / Study / Recap), vanilla HTML/CSS/JS. \|

	## How to work in parallel
	1. At kickoff, lock `schema.py` together.
	2. Each module already ships working stubs — build your real logic behind the
	same function signatures, flip `RECALL_STUB=0` to test for real.
	3. Don't change public function signatures without telling the team.

	## The judging hook
	The small model is load-bearing in three visible places: **grading free-text
	answers with explanations, generating follow-up questions that drill the
	exact concept you missed, and reading scanned/photographed material** to build
	the deck. Make sure the demo shows them.