Spaces:

build-small-hackathon
/

PaperProf

Running on Zero

App Files Files Community

PaperProf / BLOG.md

Mehdi

chore: update commit counter 68 → 101

fd28ec7 14 days ago

preview code

Raw

History Blame Contribute Delete

11.6 kB

	# 📄 PaperProf: How We Fought Gradio, Won, and Built an AI Study Buddy in 10 Days

	Field notes from the Build Small Hackathon — June 5–15, 2026

	---

	## The Pitch

	Every student knows the ritual: it's 11 PM, the exam is tomorrow, and you're re-reading the same lecture PDF for the fourth time, feeling productive while learning absolutely nothing. Passive re-reading is one of the worst-performing study techniques in the learning-science literature. Active recall — forcing yourself to answer questions — is one of the best.

	So we built PaperProf: drop in any course PDF, and it becomes your personal professor. It reads the material, generates exam-style questions from it, grades your answers like a patient tutor, and even paints you a parting gift when you finish your session.

	[Try it live on Hugging Face Spaces →](https://huggingface.co/spaces/build-small-hackathon/PaperProf)

	Everything runs on free infrastructure with zero external API calls. No OpenAI key, no rate limits, no data leaving the machine. Just open-weight models doing honest work on a ZeroGPU slice.

	---

	## What It Does

	1. Upload a PDF — lecture notes, a textbook chapter, slides, whatever you're cramming.
	2. PaperProf chunks it into thematic sections and picks one at random.
	3. Choose your mode:
	- Open questions — write a free-form answer, get structured tutor feedback: a verdict, what you got right, what you missed, and a model answer.
	- MCQ — four plausible options, instant client-side grading, and a one-sentence explanation for every choice, not just the right one.
	4. A score ring tracks your session in real time.
	5. End the session and FLUX.2-klein generates a unique image inspired by the topics you just studied — a small visual reward for showing up.

	The whole question-answer-feedback loop runs on MiniCPM4.1-8B, our QLoRA fine-tune of openbmb's latest 8B model, loaded once and shared between question generation and answer evaluation.

	```
	PDF upload
	└─► parser.py — PyMuPDF text extraction
	└─► chunker.py — thematic chunking (min/max word caps)
	└─► questioner.py — MiniCPM4-8B writes ONE focused question
	└─► you answer
	└─► evaluator.py — the same model grades you like a tutor
	└─► image_gen.py — FLUX.2-klein paints your session
	```

	---

	## Badges Earned — 6 / 6

	Build Small Hackathon awards merit badges for specific technical achievements. Here's where we landed:

	\| Badge \| Status \| What it took \|
	\|---\|---\|---\|
	\| Off the Grid \| ✅ Earned \| Zero external APIs — everything runs via ZeroGPU, no OpenAI key, no rate limits, no data leaving the machine \|
	\| Well-Tuned \| ✅ Earned \| QLoRA fine-tune on SQuAD, model published at `build-small-hackathon/MiniCPM4-8B-PaperProf` \|
	\| Off-Brand \| ✅ Earned \| Hand-built HTML/CSS/JS — Gradio is invisible, the entire UI is 100% custom (see Lesson 2 below) \|
	\| Llama Champion \| ✅ Earned \| GGUF published at `build-small-hackathon/MiniCPM4-8B-PaperProf-GGUF`, llama.cpp CPU runtime wired in via `PAPERPROF_RUNTIME=llamacpp` \|
	\| Field Notes \| ✅ Earned \| This post, plus the interactive `blog/index.html` site \|
	\| Sharing is Caring \| ✅ Earned \| 12 LLM steps across 3 live sessions published as dataset `build-small-hackathon/PaperProf-traces` \|

	---

	## The Real Story: 68 Commits of Lessons

	A hackathon README tells you what was built. The git log tells you what actually happened. Ours has 101 commits, and roughly two-thirds of them start with `fix:`. Here is the honest version.

	### Lesson 1 — Model choice is a compatibility problem, not a benchmark problem

	We started with MiniCPM3-4B, upgraded to MiniCPM4-8B for better reasoning, and immediately hit the classic open-model trap: the model card says one thing, the `transformers` version on your machine says another.

	```
	fix: pin transformers==4.57.1 for MiniCPM4-8B compatibility
	```

	One pinned version later, everything worked. The follow-up lesson came from quantization: bitsandbytes 4-bit is great on a 16 GB local GPU and completely unnecessary on ZeroGPU's hardware — so we made it conditional:

	```python
	# HF Spaces (ZeroGPU): skip quantization, use bfloat16 directly
	if os.environ.get("SPACE_ID"):
	return None
	# Locally: 4-bit when VRAM < 17 GB
	```

	Same code, two deployment targets, zero config files. Detect the environment, adapt.

	### Lesson 2 — The Off-Brand badge nearly broke us (and taught us the most)

	The hackathon has an Off-Brand badge: ship a UI that doesn't look like the framework you built it with. We wanted PaperProf to look like a real product — glassmorphism, animated score ring, dark academia palette — not a Gradio demo.

	Attempt #1: restyle Gradio with CSS. We fought the theme system through eleven consecutive commits (`fix: CSS labels illisibles`, `fix: override variables CSS Gradio`, `fix: retire primary_hue orange qui changeait toutes les teintes`...). Gradio's theming always had one more `!important` than we did.

	Attempt #2: nuke it from orbit. Docker SDK, FastAPI serving raw HTML, Gradio relegated to a backend. It worked locally and died on Spaces — we lost ZeroGPU integration, which only flows through the Gradio SDK.

	Attempt #3, the one that shipped: the hidden-component bridge. Keep Gradio as an invisible backend inside the page. Serve a fully custom HTML/CSS/JS interface through `gr.HTML`, hide every real Gradio component off-screen, and let a 300ms JavaScript polling loop ferry data between the two worlds.

	This pattern produced the three hardest-won discoveries of the hackathon:

	`display: none` silently kills Gradio. Components hidden that way never get their Svelte event handlers attached. The fix is the oldest trick in CSS:

	```css
	/* collapsed but NOT display:none, so Gradio attaches event handlers */
	#hidden-row-question { height: 0 !important; overflow: visible !important; }
	```

	You can't `.click()` a Gradio button from JS. Server-side rendering means the synthetic click goes nowhere. What does work: programmatically setting a hidden textbox's value through the native property descriptor, then dispatching `input`/`change` events so Svelte notices:

	```javascript
	function setGradioTA(sel, val) {
	const el = document.querySelector(sel);
	Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value')
	.set.call(el, val);
	el.dispatchEvent(new Event('input', {bubbles: true}));
	el.dispatchEvent(new Event('change', {bubbles: true}));
	}
	```

	Every action in PaperProf — generate question, submit answer, new MCQ — is a timestamp written into a hidden textbox, picked up by a `.change()` listener on the Python side. Buttons that aren't buttons.

	MutationObserver loses to Svelte. Gradio's reactive DOM updates don't always fire observers the way you'd expect. We surrendered and switched to a humble `setInterval` polling loop. Less elegant, infinitely more reliable. Sometimes the dumb solution is the senior solution.

	### Lesson 3 — ZeroGPU makes you think in seconds

	ZeroGPU gives you a serious GPU for free, but only in short decorated windows. That budget reshapes your architecture:

	- First-call cold starts are real. Loading an 8B model takes ~60–90s the first time. We built the UI to be honest about it: a live elapsed-time counter, escalating messages ("Model loading…", "Still loading… first call can take ~90s"), and a 3-minute hard timeout that unlocks the UI instead of spinning forever.
	- Never download inside the GPU window. FLUX.2-klein weighs ~16 GB. We prefetch it in a daemon thread at startup, so the `@spaces.GPU` window is spent generating, not downloading. We even skip a 7.75 GB duplicate ComfyUI checkpoint in the repo that diffusers never reads.
	- Don't burn GPU time on things JavaScript can do. MCQ grading needs no model call — the LLM emits a structured format once (`QUESTION:` / `A)`–`D)` / `CORRECT:` / `EXPLAIN_A:`…), we parse it into JSON, and the browser grades clicks instantly. Zero latency, zero GPU seconds.

	### Lesson 4 — The bug that fired twice

	Late in the hackathon, our session-summary modal showed every MCQ answer duplicated: answer one question, see it counted twice, score 0/2.

	The cause was textbook event-handling: MCQ buttons had `btn.onclick = handler` assigned in the display function and an `addEventListener` registered by the global wiring function. One click, two handlers, two score increments. Our first fix removed the wrong one — the `addEventListener` had a timing flaw with its idempotency guard, so clicks then did nothing at all. The final fix kept the `onclick` (reassigned fresh with each question, inherently idempotent) and added a `mcqAnswered` re-entrancy guard for belt-and-suspenders.

	Moral: when two pieces of code both "helpfully" wire the same button, you don't have redundancy — you have a race.

	### Lesson 5 — Prompts are product decisions

	Small prompt details made the difference between "tech demo" and "usable study tool":

	- Early questions were rambling multi-part monsters. The fix was brutal constraint: "ONE question only, on ONE concept. Maximum 25 words. No sub-questions, no 'and'."
	- The evaluator follows a fixed 4-part structure (Verdict / What was good / What was missing / Model answer) so the frontend can parse and render it as styled sections — prompt format is API contract.
	- With French source PDFs, the model kept drifting into French. Polite instructions lost to the gravitational pull of the context. What finally worked: `IMPORTANT: Always write in English, even if the source text is in another language` — stated twice, once at the top and once at the bottom of the prompt. With 8B models, subtlety is wasted; repetition is a feature.

	---

	## What We'd Tell Past Us

	1. Read the git log of your own project sometimes. Two-thirds `fix:` commits isn't failure — it's the actual texture of shipping. Each one was a lesson nobody had written down for us.
	2. Frameworks fight back hardest at the edges. Using Gradio normally is easy. Using it as an invisible backend required understanding how it actually renders. The weird workarounds (`height:0`, textbox triggers, polling) are now reusable knowledge.
	3. Free infrastructure imposes honest engineering. No API credits to hide behind means caring about cold starts, GPU seconds, and weight prefetching. Constraints made the architecture better.
	4. Client-side everything you can. The MCQ mode is the snappiest feature in the app precisely because it never touches the server after generation.
	5. Ship the small thing. PaperProf does one loop — read, ask, grade, encourage — and does it end-to-end. A hackathon project that completes one circle beats one that sketches five.

	---

	## The Stack

	\| Layer \| Choice \|
	\|---\|---\|
	\| Q&A + evaluation \| MiniCPM4.1-8B (openbmb), QLoRA fine-tune, bfloat16, transformers 4.57.1 \|
	\| Session images \| FLUX.2-klein-4B (Black Forest Labs), diffusers \|
	\| PDF parsing \| PyMuPDF \|
	\| Backend / hosting \| Gradio 6 on Hugging Face Spaces, ZeroGPU \|
	\| Frontend \| Hand-written HTML/CSS/JS over a hidden-Gradio bridge \|
	\| External APIs \| None. 🔌 Fully off the grid. \|

	---

	Built for the Build Small Hackathon, June 2026. The Space is live — bring a PDF and let the professor grill you: [huggingface.co/spaces/build-small-hackathon/PaperProf](https://huggingface.co/spaces/build-small-hackathon/PaperProf)