# 📄 PaperProf: How We Fought Gradio, Won, and Built an AI Study Buddy in 10 Days

*Field notes from the Build Small Hackathon — June 5–15, 2026*

---

## The Pitch

Every student knows the ritual: it's 11 PM, the exam is tomorrow, and you're re-reading the same lecture PDF for the fourth time, *feeling* productive while learning absolutely nothing. Passive re-reading is one of the worst-performing study techniques in the learning-science literature. Active recall — forcing yourself to answer questions — is one of the best.

So we built **PaperProf**: drop in any course PDF, and it becomes your personal professor. It reads the material, generates exam-style questions from it, grades your answers like a patient tutor, and even paints you a parting gift when you finish your session.

**[Try it live on Hugging Face Spaces →](https://huggingface.co/spaces/build-small-hackathon/PaperProf)**

Everything runs on free infrastructure with zero external API calls. No OpenAI key, no rate limits, no data leaving the machine. Just open-weight models doing honest work on a ZeroGPU slice.

---

## What It Does

1. **Upload a PDF** — lecture notes, a textbook chapter, slides, whatever you're cramming.
2. **PaperProf chunks it** into thematic sections and picks one at random.
3. **Choose your mode:**
   - **Open questions** — write a free-form answer, get structured tutor feedback: a verdict, what you got right, what you missed, and a model answer.
   - **MCQ** — four plausible options, instant client-side grading, and a one-sentence explanation for *every* choice, not just the right one.
4. **A score ring** tracks your session in real time.
5. **End the session** and FLUX.2-klein generates a unique image inspired by the topics you just studied — a small visual reward for showing up.

The whole question-answer-feedback loop runs on **MiniCPM4.1-8B**, our QLoRA fine-tune of openbmb's latest 8B model, loaded once and shared between question generation and answer evaluation.

```
PDF upload
 └─► parser.py      — PyMuPDF text extraction
      └─► chunker.py — thematic chunking (min/max word caps)
           └─► questioner.py — MiniCPM4-8B writes ONE focused question
                └─► you answer
                     └─► evaluator.py — the same model grades you like a tutor
                          └─► image_gen.py — FLUX.2-klein paints your session
```

---

## Badges Earned — 6 / 6

Build Small Hackathon awards merit badges for specific technical achievements. Here's where we landed:

| Badge | Status | What it took |
|---|---|---|
| **Off the Grid** | ✅ Earned | Zero external APIs — everything runs via ZeroGPU, no OpenAI key, no rate limits, no data leaving the machine |
| **Well-Tuned** | ✅ Earned | QLoRA fine-tune on SQuAD, model published at `build-small-hackathon/MiniCPM4-8B-PaperProf` |
| **Off-Brand** | ✅ Earned | Hand-built HTML/CSS/JS — Gradio is invisible, the entire UI is 100% custom (see Lesson 2 below) |
| **Llama Champion** | ✅ Earned | GGUF published at `build-small-hackathon/MiniCPM4-8B-PaperProf-GGUF`, llama.cpp CPU runtime wired in via `PAPERPROF_RUNTIME=llamacpp` |
| **Field Notes** | ✅ Earned | This post, plus the interactive `blog/index.html` site |
| **Sharing is Caring** | ✅ Earned | 12 LLM steps across 3 live sessions published as dataset `build-small-hackathon/PaperProf-traces` |

---

## The Real Story: 68 Commits of Lessons

A hackathon README tells you what was built. The git log tells you what actually happened. Ours has 101 commits, and roughly two-thirds of them start with `fix:`. Here is the honest version.

### Lesson 1 — Model choice is a compatibility problem, not a benchmark problem

We started with MiniCPM3-4B, upgraded to MiniCPM4-8B for better reasoning, and immediately hit the classic open-model trap: the model card says one thing, the `transformers` version on your machine says another.

```
fix: pin transformers==4.57.1 for MiniCPM4-8B compatibility
```

One pinned version later, everything worked. The follow-up lesson came from quantization: bitsandbytes 4-bit is great on a 16 GB local GPU and *completely unnecessary* on ZeroGPU's hardware — so we made it conditional:

```python
# HF Spaces (ZeroGPU): skip quantization, use bfloat16 directly
if os.environ.get("SPACE_ID"):
    return None
# Locally: 4-bit when VRAM < 17 GB
```

Same code, two deployment targets, zero config files. Detect the environment, adapt.

### Lesson 2 — The Off-Brand badge nearly broke us (and taught us the most)

The hackathon has an **Off-Brand** badge: ship a UI that doesn't look like the framework you built it with. We wanted PaperProf to look like a real product — glassmorphism, animated score ring, dark academia palette — not a Gradio demo.

Attempt #1: restyle Gradio with CSS. We fought the theme system through *eleven consecutive commits* (`fix: CSS labels illisibles`, `fix: override variables CSS Gradio`, `fix: retire primary_hue orange qui changeait toutes les teintes`...). Gradio's theming always had one more `!important` than we did.

Attempt #2: nuke it from orbit. Docker SDK, FastAPI serving raw HTML, Gradio relegated to a backend. It worked locally and died on Spaces — we lost ZeroGPU integration, which only flows through the Gradio SDK.

Attempt #3, the one that shipped: **the hidden-component bridge**. Keep Gradio as an invisible backend *inside the page*. Serve a fully custom HTML/CSS/JS interface through `gr.HTML`, hide every real Gradio component off-screen, and let a 300ms JavaScript polling loop ferry data between the two worlds.

This pattern produced the three hardest-won discoveries of the hackathon:

**`display: none` silently kills Gradio.** Components hidden that way never get their Svelte event handlers attached. The fix is the oldest trick in CSS:

```css
/* collapsed but NOT display:none, so Gradio attaches event handlers */
#hidden-row-question { height: 0 !important; overflow: visible !important; }
```

**You can't `.click()` a Gradio button from JS.** Server-side rendering means the synthetic click goes nowhere. What *does* work: programmatically setting a hidden textbox's value through the native property descriptor, then dispatching `input`/`change` events so Svelte notices:

```javascript
function setGradioTA(sel, val) {
  const el = document.querySelector(sel);
  Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value')
        .set.call(el, val);
  el.dispatchEvent(new Event('input',  {bubbles: true}));
  el.dispatchEvent(new Event('change', {bubbles: true}));
}
```

Every action in PaperProf — generate question, submit answer, new MCQ — is a timestamp written into a hidden textbox, picked up by a `.change()` listener on the Python side. Buttons that aren't buttons.

**MutationObserver loses to Svelte.** Gradio's reactive DOM updates don't always fire observers the way you'd expect. We surrendered and switched to a humble `setInterval` polling loop. Less elegant, infinitely more reliable. Sometimes the dumb solution is the senior solution.

### Lesson 3 — ZeroGPU makes you think in seconds

ZeroGPU gives you a serious GPU for free, but only in short decorated windows. That budget reshapes your architecture:

- **First-call cold starts are real.** Loading an 8B model takes ~60–90s the first time. We built the UI to be honest about it: a live elapsed-time counter, escalating messages ("Model loading…", "Still loading… first call can take ~90s"), and a 3-minute hard timeout that unlocks the UI instead of spinning forever.
- **Never download inside the GPU window.** FLUX.2-klein weighs ~16 GB. We prefetch it in a daemon thread at *startup*, so the `@spaces.GPU` window is spent generating, not downloading. We even skip a 7.75 GB duplicate ComfyUI checkpoint in the repo that diffusers never reads.
- **Don't burn GPU time on things JavaScript can do.** MCQ grading needs no model call — the LLM emits a structured format once (`QUESTION:` / `A)`–`D)` / `CORRECT:` / `EXPLAIN_A:`…), we parse it into JSON, and the browser grades clicks instantly. Zero latency, zero GPU seconds.

### Lesson 4 — The bug that fired twice

Late in the hackathon, our session-summary modal showed every MCQ answer **duplicated**: answer one question, see it counted twice, score 0/2.

The cause was textbook event-handling: MCQ buttons had `btn.onclick = handler` assigned in the display function *and* an `addEventListener` registered by the global wiring function. One click, two handlers, two score increments. Our first fix removed the wrong one — the `addEventListener` had a timing flaw with its idempotency guard, so clicks then did *nothing at all*. The final fix kept the `onclick` (reassigned fresh with each question, inherently idempotent) and added a `mcqAnswered` re-entrancy guard for belt-and-suspenders.

Moral: when two pieces of code both "helpfully" wire the same button, you don't have redundancy — you have a race.

### Lesson 5 — Prompts are product decisions

Small prompt details made the difference between "tech demo" and "usable study tool":

- Early questions were rambling multi-part monsters. The fix was brutal constraint: *"ONE question only, on ONE concept. Maximum 25 words. No sub-questions, no 'and'."*
- The evaluator follows a fixed 4-part structure (Verdict / What was good / What was missing / Model answer) so the frontend can parse and render it as styled sections — prompt format *is* API contract.
- With French source PDFs, the model kept drifting into French. Polite instructions lost to the gravitational pull of the context. What finally worked: `IMPORTANT: Always write in English, even if the source text is in another language` — stated twice, once at the top and once at the bottom of the prompt. With 8B models, subtlety is wasted; repetition is a feature.

---

## What We'd Tell Past Us

1. **Read the git log of your own project sometimes.** Two-thirds `fix:` commits isn't failure — it's the actual texture of shipping. Each one was a lesson nobody had written down for us.
2. **Frameworks fight back hardest at the edges.** Using Gradio normally is easy. Using it as an invisible backend required understanding how it *actually* renders. The weird workarounds (`height:0`, textbox triggers, polling) are now reusable knowledge.
3. **Free infrastructure imposes honest engineering.** No API credits to hide behind means caring about cold starts, GPU seconds, and weight prefetching. Constraints made the architecture better.
4. **Client-side everything you can.** The MCQ mode is the snappiest feature in the app precisely because it never touches the server after generation.
5. **Ship the small thing.** PaperProf does one loop — read, ask, grade, encourage — and does it end-to-end. A hackathon project that completes one circle beats one that sketches five.

---

## The Stack

| Layer | Choice |
|---|---|
| Q&A + evaluation | MiniCPM4.1-8B (openbmb), QLoRA fine-tune, bfloat16, transformers 4.57.1 |
| Session images | FLUX.2-klein-4B (Black Forest Labs), diffusers |
| PDF parsing | PyMuPDF |
| Backend / hosting | Gradio 6 on Hugging Face Spaces, ZeroGPU |
| Frontend | Hand-written HTML/CSS/JS over a hidden-Gradio bridge |
| External APIs | **None.** 🔌 Fully off the grid. |

---

*Built for the Build Small Hackathon, June 2026. The Space is live — bring a PDF and let the professor grill you: [huggingface.co/spaces/build-small-hackathon/PaperProf](https://huggingface.co/spaces/build-small-hackathon/PaperProf)*