study-partner / README.md
nz-nz's picture
Sync from GitHub via hub-sync
5930af9 verified
|
Raw
History Blame Contribute Delete
7.66 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Recall  AI Study Partner
emoji: 📚
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.10.0
app_file: server.py
pinned: false
license: mit
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:offgrid
  - achievement:offbrand

📚 Recall — an AI study partner that gets smarter about what you get wrong

Upload your study material — typed notes, a PDF, even a photo or scan of a page → Recall generates a quiz deck → you answer → a small model grades and explains each answer → it generates new questions targeting exactly what you missed → end-of-session recap. Built for the Build Small Hackathon (Backyard AI track).

  • Model: openbmb/MiniCPM-V-4.6 — multimodal (grades text and reads images/scans). Text-only fallbacks: MiniCPM4.1-8B, MiniCPM5-1B, MiniCPM3-4B.
  • Platform: Gradio app, hosted as a Hugging Face Space
  • Demo video: YouTube
  • Social post: LinkedIn

Team

Member Hugging Face
Nikolai @nz-nz
Frank @francisco-magana
Arturo @arturogp3

Run it (stub mode — no GPU, no model download)

pip install -r requirements.txt
python server.py         # http://127.0.0.1:7860  ← polished custom frontend

Everything works end-to-end on canned data, so anyone can clone and click through the full loop in minute one.

server.py serves the Recall design (frontend/index.html) and a thin JSON API over the existing backend — the learning/content logic and the schema.py data contract are treated as an API and are never modified. It's built on gradio.Server (a FastAPI subclass), so the same gradio-SDK Space that installs gradio also runs the custom frontend; app.launch(prevent_thread_lock=True) binds port 7860 directly while the main thread is held open. The original Gradio form is still available standalone via python app.py.

Run with the real model

The heavy model deps (torch/transformers/…) are kept out of requirements.txt so the Space build stays fast in stub mode. Install them with the model requirements:

pip install -r requirements-model.txt
RECALL_STUB=0 python server.py

Dependency pins (why gradio is 6.10.0). The binding constraint is the custom-frontend server: it uses gradio.Server, and on gradio 6.17.x a custom Server breaks under a Space's runtime (app starts, process exits → RUNTIME_ERROR). gradio 6.10.0 is the version gradio's own ZeroGPU Server reference example ships and runs cleanly. It also resolves with the real model: MiniCPM-V 4.6 runs on transformers 5.x, which wants huggingface-hub 1.x, and 6.10.0 allows huggingface-hub <2.0,>=0.33.5 (i.e. hub 1.x). A gradio-SDK Space force-installs one gradio for the whole Space, so stub and real-model share it without a Docker Space — keep requirements.txt, requirements-model.txt and the Space sdk_version in lockstep. The smaller text fallbacks add no extra constraint.

On Apple Silicon (M1/M2/…), the default bf16 + MPS combo produces garbage output (a known MPS bf16 instability — not present on the Space's CUDA GPU). For a clean local real-model smoke test, force CPU/float32:

RECALL_STUB=0 RECALL_MODEL=1b RECALL_DTYPE=float32 RECALL_DEVICE=cpu python server.py

The model

Recall runs on openbmb/MiniCPM-V-4.6, an open multimodal model from OpenBMB chosen for the Backyard AI track: small enough to serve on a single Hugging Face ZeroGPU Space, capable enough to grade free-text answers, write grounded follow-up questions, and read scanned or photographed material directly. One model does both the text and the vision work.

Where the model is load-bearing. Three user-visible features are pure model work, not templated strings:

  • Grading — it compares your free-text answer to the reference answer and returns a 0–5 score, a plain-language explanation, and the specific concept you missed.
  • Adaptive follow-ups — from that missed concept it writes brand-new questions that drill exactly what you got wrong.
  • Vision / OCR — image-only or scanned PDFs that have no selectable text are rendered to images and read by the model directly to build the deck (content_pipeline.py), so slide photos and scans work, not just digital text.

How inference is served. Everything model-related goes through a single chat(messages, max_tokens) wrapper in llm.py; no other module imports transformers directly. The model is loaded once, lazily, on the Space's ZeroGPU — the multimodal default via MiniCPMV4_6ForConditionalGeneration + an AutoProcessor, the text-only fallbacks via AutoModelForCausalLM + AutoTokenizer — in bf16 with device_map="auto", and the GPU entrypoint is wrapped in @spaces.GPU. max_tokens is kept tight (256–512) because latency is the demo-killer. Model output is never trusted: replies expected to be JSON are parsed defensively, with one repair retry and a safe fallback so a malformed generation can never crash the study loop.

Stub mode. With RECALL_STUB=1 (the default) chat() returns canned replies, so the whole app runs and demos end-to-end with no GPU and no model download. Flip RECALL_STUB=0 to use the real model.

Fallback (config flip, no code change). If the Space is too slow or runs out of memory, swap to a smaller model by setting RECALL_MODEL — the rest of the pipeline is unchanged (the text-only fallbacks drop the image/OCR path):

# text fallback (8B)
RECALL_MODEL=8b RECALL_STUB=0 python server.py   # MiniCPM4.1-8B
# fast fallback
RECALL_MODEL=1b RECALL_STUB=0 python server.py   # MiniCPM5-1B
# mid fallback — ≤4B, so it qualifies for the Tiny Titan prize
RECALL_MODEL=4b RECALL_STUB=0 python server.py   # MiniCPM3-4B

Project layout

File Owner What it is
schema.py shared The data contract (Card, CardState, GradeResult, Session). Don't change without a sync.
llm.py Nikolai Shared MiniCPM inference wrapper + defensive JSON parsing.
learning_engine.py Nikolai Scheduling (SM-2-lite), grading, adaptation, follow-ups, recap.
content_pipeline.py Frank Text & image PDFs → chunks (scans render to page images for the vision model) → question cards.
app.py Arturo Gradio UI (Upload / Study / Recap) over gr.State — standalone fallback (python app.py).
server.py FastAPI server: serves the custom frontend + JSON API over the backend.
frontend/index.html The polished Recall design (Upload / Study / Recap), vanilla HTML/CSS/JS.

How to work in parallel

  1. At kickoff, lock schema.py together.
  2. Each module already ships working stubs — build your real logic behind the same function signatures, flip RECALL_STUB=0 to test for real.
  3. Don't change public function signatures without telling the team.

The judging hook

The small model is load-bearing in three visible places: grading free-text answers with explanations, generating follow-up questions that drill the exact concept you missed, and reading scanned/photographed material to build the deck. Make sure the demo shows them.