PaperProf / BLOG.md
Mehdi
chore: update commit counter 68 β†’ 101
fd28ec7
|
Raw
History Blame Contribute Delete
11.6 kB
# πŸ“„ PaperProf: How We Fought Gradio, Won, and Built an AI Study Buddy in 10 Days
*Field notes from the Build Small Hackathon β€” June 5–15, 2026*
---
## The Pitch
Every student knows the ritual: it's 11 PM, the exam is tomorrow, and you're re-reading the same lecture PDF for the fourth time, *feeling* productive while learning absolutely nothing. Passive re-reading is one of the worst-performing study techniques in the learning-science literature. Active recall β€” forcing yourself to answer questions β€” is one of the best.
So we built **PaperProf**: drop in any course PDF, and it becomes your personal professor. It reads the material, generates exam-style questions from it, grades your answers like a patient tutor, and even paints you a parting gift when you finish your session.
**[Try it live on Hugging Face Spaces β†’](https://huggingface.co/spaces/build-small-hackathon/PaperProf)**
Everything runs on free infrastructure with zero external API calls. No OpenAI key, no rate limits, no data leaving the machine. Just open-weight models doing honest work on a ZeroGPU slice.
---
## What It Does
1. **Upload a PDF** β€” lecture notes, a textbook chapter, slides, whatever you're cramming.
2. **PaperProf chunks it** into thematic sections and picks one at random.
3. **Choose your mode:**
- **Open questions** β€” write a free-form answer, get structured tutor feedback: a verdict, what you got right, what you missed, and a model answer.
- **MCQ** β€” four plausible options, instant client-side grading, and a one-sentence explanation for *every* choice, not just the right one.
4. **A score ring** tracks your session in real time.
5. **End the session** and FLUX.2-klein generates a unique image inspired by the topics you just studied β€” a small visual reward for showing up.
The whole question-answer-feedback loop runs on **MiniCPM4.1-8B**, our QLoRA fine-tune of openbmb's latest 8B model, loaded once and shared between question generation and answer evaluation.
```
PDF upload
└─► parser.py β€” PyMuPDF text extraction
└─► chunker.py β€” thematic chunking (min/max word caps)
└─► questioner.py β€” MiniCPM4-8B writes ONE focused question
└─► you answer
└─► evaluator.py β€” the same model grades you like a tutor
└─► image_gen.py β€” FLUX.2-klein paints your session
```
---
## Badges Earned β€” 6 / 6
Build Small Hackathon awards merit badges for specific technical achievements. Here's where we landed:
| Badge | Status | What it took |
|---|---|---|
| **Off the Grid** | βœ… Earned | Zero external APIs β€” everything runs via ZeroGPU, no OpenAI key, no rate limits, no data leaving the machine |
| **Well-Tuned** | βœ… Earned | QLoRA fine-tune on SQuAD, model published at `build-small-hackathon/MiniCPM4-8B-PaperProf` |
| **Off-Brand** | βœ… Earned | Hand-built HTML/CSS/JS β€” Gradio is invisible, the entire UI is 100% custom (see Lesson 2 below) |
| **Llama Champion** | βœ… Earned | GGUF published at `build-small-hackathon/MiniCPM4-8B-PaperProf-GGUF`, llama.cpp CPU runtime wired in via `PAPERPROF_RUNTIME=llamacpp` |
| **Field Notes** | βœ… Earned | This post, plus the interactive `blog/index.html` site |
| **Sharing is Caring** | βœ… Earned | 12 LLM steps across 3 live sessions published as dataset `build-small-hackathon/PaperProf-traces` |
---
## The Real Story: 68 Commits of Lessons
A hackathon README tells you what was built. The git log tells you what actually happened. Ours has 101 commits, and roughly two-thirds of them start with `fix:`. Here is the honest version.
### Lesson 1 β€” Model choice is a compatibility problem, not a benchmark problem
We started with MiniCPM3-4B, upgraded to MiniCPM4-8B for better reasoning, and immediately hit the classic open-model trap: the model card says one thing, the `transformers` version on your machine says another.
```
fix: pin transformers==4.57.1 for MiniCPM4-8B compatibility
```
One pinned version later, everything worked. The follow-up lesson came from quantization: bitsandbytes 4-bit is great on a 16 GB local GPU and *completely unnecessary* on ZeroGPU's hardware β€” so we made it conditional:
```python
# HF Spaces (ZeroGPU): skip quantization, use bfloat16 directly
if os.environ.get("SPACE_ID"):
return None
# Locally: 4-bit when VRAM < 17 GB
```
Same code, two deployment targets, zero config files. Detect the environment, adapt.
### Lesson 2 β€” The Off-Brand badge nearly broke us (and taught us the most)
The hackathon has an **Off-Brand** badge: ship a UI that doesn't look like the framework you built it with. We wanted PaperProf to look like a real product β€” glassmorphism, animated score ring, dark academia palette β€” not a Gradio demo.
Attempt #1: restyle Gradio with CSS. We fought the theme system through *eleven consecutive commits* (`fix: CSS labels illisibles`, `fix: override variables CSS Gradio`, `fix: retire primary_hue orange qui changeait toutes les teintes`...). Gradio's theming always had one more `!important` than we did.
Attempt #2: nuke it from orbit. Docker SDK, FastAPI serving raw HTML, Gradio relegated to a backend. It worked locally and died on Spaces β€” we lost ZeroGPU integration, which only flows through the Gradio SDK.
Attempt #3, the one that shipped: **the hidden-component bridge**. Keep Gradio as an invisible backend *inside the page*. Serve a fully custom HTML/CSS/JS interface through `gr.HTML`, hide every real Gradio component off-screen, and let a 300ms JavaScript polling loop ferry data between the two worlds.
This pattern produced the three hardest-won discoveries of the hackathon:
**`display: none` silently kills Gradio.** Components hidden that way never get their Svelte event handlers attached. The fix is the oldest trick in CSS:
```css
/* collapsed but NOT display:none, so Gradio attaches event handlers */
#hidden-row-question { height: 0 !important; overflow: visible !important; }
```
**You can't `.click()` a Gradio button from JS.** Server-side rendering means the synthetic click goes nowhere. What *does* work: programmatically setting a hidden textbox's value through the native property descriptor, then dispatching `input`/`change` events so Svelte notices:
```javascript
function setGradioTA(sel, val) {
const el = document.querySelector(sel);
Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value')
.set.call(el, val);
el.dispatchEvent(new Event('input', {bubbles: true}));
el.dispatchEvent(new Event('change', {bubbles: true}));
}
```
Every action in PaperProf β€” generate question, submit answer, new MCQ β€” is a timestamp written into a hidden textbox, picked up by a `.change()` listener on the Python side. Buttons that aren't buttons.
**MutationObserver loses to Svelte.** Gradio's reactive DOM updates don't always fire observers the way you'd expect. We surrendered and switched to a humble `setInterval` polling loop. Less elegant, infinitely more reliable. Sometimes the dumb solution is the senior solution.
### Lesson 3 β€” ZeroGPU makes you think in seconds
ZeroGPU gives you a serious GPU for free, but only in short decorated windows. That budget reshapes your architecture:
- **First-call cold starts are real.** Loading an 8B model takes ~60–90s the first time. We built the UI to be honest about it: a live elapsed-time counter, escalating messages ("Model loading…", "Still loading… first call can take ~90s"), and a 3-minute hard timeout that unlocks the UI instead of spinning forever.
- **Never download inside the GPU window.** FLUX.2-klein weighs ~16 GB. We prefetch it in a daemon thread at *startup*, so the `@spaces.GPU` window is spent generating, not downloading. We even skip a 7.75 GB duplicate ComfyUI checkpoint in the repo that diffusers never reads.
- **Don't burn GPU time on things JavaScript can do.** MCQ grading needs no model call β€” the LLM emits a structured format once (`QUESTION:` / `A)`–`D)` / `CORRECT:` / `EXPLAIN_A:`…), we parse it into JSON, and the browser grades clicks instantly. Zero latency, zero GPU seconds.
### Lesson 4 β€” The bug that fired twice
Late in the hackathon, our session-summary modal showed every MCQ answer **duplicated**: answer one question, see it counted twice, score 0/2.
The cause was textbook event-handling: MCQ buttons had `btn.onclick = handler` assigned in the display function *and* an `addEventListener` registered by the global wiring function. One click, two handlers, two score increments. Our first fix removed the wrong one β€” the `addEventListener` had a timing flaw with its idempotency guard, so clicks then did *nothing at all*. The final fix kept the `onclick` (reassigned fresh with each question, inherently idempotent) and added a `mcqAnswered` re-entrancy guard for belt-and-suspenders.
Moral: when two pieces of code both "helpfully" wire the same button, you don't have redundancy β€” you have a race.
### Lesson 5 β€” Prompts are product decisions
Small prompt details made the difference between "tech demo" and "usable study tool":
- Early questions were rambling multi-part monsters. The fix was brutal constraint: *"ONE question only, on ONE concept. Maximum 25 words. No sub-questions, no 'and'."*
- The evaluator follows a fixed 4-part structure (Verdict / What was good / What was missing / Model answer) so the frontend can parse and render it as styled sections β€” prompt format *is* API contract.
- With French source PDFs, the model kept drifting into French. Polite instructions lost to the gravitational pull of the context. What finally worked: `IMPORTANT: Always write in English, even if the source text is in another language` β€” stated twice, once at the top and once at the bottom of the prompt. With 8B models, subtlety is wasted; repetition is a feature.
---
## What We'd Tell Past Us
1. **Read the git log of your own project sometimes.** Two-thirds `fix:` commits isn't failure β€” it's the actual texture of shipping. Each one was a lesson nobody had written down for us.
2. **Frameworks fight back hardest at the edges.** Using Gradio normally is easy. Using it as an invisible backend required understanding how it *actually* renders. The weird workarounds (`height:0`, textbox triggers, polling) are now reusable knowledge.
3. **Free infrastructure imposes honest engineering.** No API credits to hide behind means caring about cold starts, GPU seconds, and weight prefetching. Constraints made the architecture better.
4. **Client-side everything you can.** The MCQ mode is the snappiest feature in the app precisely because it never touches the server after generation.
5. **Ship the small thing.** PaperProf does one loop β€” read, ask, grade, encourage β€” and does it end-to-end. A hackathon project that completes one circle beats one that sketches five.
---
## The Stack
| Layer | Choice |
|---|---|
| Q&A + evaluation | MiniCPM4.1-8B (openbmb), QLoRA fine-tune, bfloat16, transformers 4.57.1 |
| Session images | FLUX.2-klein-4B (Black Forest Labs), diffusers |
| PDF parsing | PyMuPDF |
| Backend / hosting | Gradio 6 on Hugging Face Spaces, ZeroGPU |
| Frontend | Hand-written HTML/CSS/JS over a hidden-Gradio bridge |
| External APIs | **None.** πŸ”Œ Fully off the grid. |
---
*Built for the Build Small Hackathon, June 2026. The Space is live β€” bring a PDF and let the professor grill you: [huggingface.co/spaces/build-small-hackathon/PaperProf](https://huggingface.co/spaces/build-small-hackathon/PaperProf)*