Spaces:
Running on Zero
Running on Zero
| # π PaperProf: How We Fought Gradio, Won, and Built an AI Study Buddy in 10 Days | |
| *Field notes from the Build Small Hackathon β June 5β15, 2026* | |
| --- | |
| ## The Pitch | |
| Every student knows the ritual: it's 11 PM, the exam is tomorrow, and you're re-reading the same lecture PDF for the fourth time, *feeling* productive while learning absolutely nothing. Passive re-reading is one of the worst-performing study techniques in the learning-science literature. Active recall β forcing yourself to answer questions β is one of the best. | |
| So we built **PaperProf**: drop in any course PDF, and it becomes your personal professor. It reads the material, generates exam-style questions from it, grades your answers like a patient tutor, and even paints you a parting gift when you finish your session. | |
| **[Try it live on Hugging Face Spaces β](https://huggingface.co/spaces/build-small-hackathon/PaperProf)** | |
| Everything runs on free infrastructure with zero external API calls. No OpenAI key, no rate limits, no data leaving the machine. Just open-weight models doing honest work on a ZeroGPU slice. | |
| --- | |
| ## What It Does | |
| 1. **Upload a PDF** β lecture notes, a textbook chapter, slides, whatever you're cramming. | |
| 2. **PaperProf chunks it** into thematic sections and picks one at random. | |
| 3. **Choose your mode:** | |
| - **Open questions** β write a free-form answer, get structured tutor feedback: a verdict, what you got right, what you missed, and a model answer. | |
| - **MCQ** β four plausible options, instant client-side grading, and a one-sentence explanation for *every* choice, not just the right one. | |
| 4. **A score ring** tracks your session in real time. | |
| 5. **End the session** and FLUX.2-klein generates a unique image inspired by the topics you just studied β a small visual reward for showing up. | |
| The whole question-answer-feedback loop runs on **MiniCPM4.1-8B**, our QLoRA fine-tune of openbmb's latest 8B model, loaded once and shared between question generation and answer evaluation. | |
| ``` | |
| PDF upload | |
| βββΊ parser.py β PyMuPDF text extraction | |
| βββΊ chunker.py β thematic chunking (min/max word caps) | |
| βββΊ questioner.py β MiniCPM4-8B writes ONE focused question | |
| βββΊ you answer | |
| βββΊ evaluator.py β the same model grades you like a tutor | |
| βββΊ image_gen.py β FLUX.2-klein paints your session | |
| ``` | |
| --- | |
| ## Badges Earned β 6 / 6 | |
| Build Small Hackathon awards merit badges for specific technical achievements. Here's where we landed: | |
| | Badge | Status | What it took | | |
| |---|---|---| | |
| | **Off the Grid** | β Earned | Zero external APIs β everything runs via ZeroGPU, no OpenAI key, no rate limits, no data leaving the machine | | |
| | **Well-Tuned** | β Earned | QLoRA fine-tune on SQuAD, model published at `build-small-hackathon/MiniCPM4-8B-PaperProf` | | |
| | **Off-Brand** | β Earned | Hand-built HTML/CSS/JS β Gradio is invisible, the entire UI is 100% custom (see Lesson 2 below) | | |
| | **Llama Champion** | β Earned | GGUF published at `build-small-hackathon/MiniCPM4-8B-PaperProf-GGUF`, llama.cpp CPU runtime wired in via `PAPERPROF_RUNTIME=llamacpp` | | |
| | **Field Notes** | β Earned | This post, plus the interactive `blog/index.html` site | | |
| | **Sharing is Caring** | β Earned | 12 LLM steps across 3 live sessions published as dataset `build-small-hackathon/PaperProf-traces` | | |
| --- | |
| ## The Real Story: 68 Commits of Lessons | |
| A hackathon README tells you what was built. The git log tells you what actually happened. Ours has 101 commits, and roughly two-thirds of them start with `fix:`. Here is the honest version. | |
| ### Lesson 1 β Model choice is a compatibility problem, not a benchmark problem | |
| We started with MiniCPM3-4B, upgraded to MiniCPM4-8B for better reasoning, and immediately hit the classic open-model trap: the model card says one thing, the `transformers` version on your machine says another. | |
| ``` | |
| fix: pin transformers==4.57.1 for MiniCPM4-8B compatibility | |
| ``` | |
| One pinned version later, everything worked. The follow-up lesson came from quantization: bitsandbytes 4-bit is great on a 16 GB local GPU and *completely unnecessary* on ZeroGPU's hardware β so we made it conditional: | |
| ```python | |
| # HF Spaces (ZeroGPU): skip quantization, use bfloat16 directly | |
| if os.environ.get("SPACE_ID"): | |
| return None | |
| # Locally: 4-bit when VRAM < 17 GB | |
| ``` | |
| Same code, two deployment targets, zero config files. Detect the environment, adapt. | |
| ### Lesson 2 β The Off-Brand badge nearly broke us (and taught us the most) | |
| The hackathon has an **Off-Brand** badge: ship a UI that doesn't look like the framework you built it with. We wanted PaperProf to look like a real product β glassmorphism, animated score ring, dark academia palette β not a Gradio demo. | |
| Attempt #1: restyle Gradio with CSS. We fought the theme system through *eleven consecutive commits* (`fix: CSS labels illisibles`, `fix: override variables CSS Gradio`, `fix: retire primary_hue orange qui changeait toutes les teintes`...). Gradio's theming always had one more `!important` than we did. | |
| Attempt #2: nuke it from orbit. Docker SDK, FastAPI serving raw HTML, Gradio relegated to a backend. It worked locally and died on Spaces β we lost ZeroGPU integration, which only flows through the Gradio SDK. | |
| Attempt #3, the one that shipped: **the hidden-component bridge**. Keep Gradio as an invisible backend *inside the page*. Serve a fully custom HTML/CSS/JS interface through `gr.HTML`, hide every real Gradio component off-screen, and let a 300ms JavaScript polling loop ferry data between the two worlds. | |
| This pattern produced the three hardest-won discoveries of the hackathon: | |
| **`display: none` silently kills Gradio.** Components hidden that way never get their Svelte event handlers attached. The fix is the oldest trick in CSS: | |
| ```css | |
| /* collapsed but NOT display:none, so Gradio attaches event handlers */ | |
| #hidden-row-question { height: 0 !important; overflow: visible !important; } | |
| ``` | |
| **You can't `.click()` a Gradio button from JS.** Server-side rendering means the synthetic click goes nowhere. What *does* work: programmatically setting a hidden textbox's value through the native property descriptor, then dispatching `input`/`change` events so Svelte notices: | |
| ```javascript | |
| function setGradioTA(sel, val) { | |
| const el = document.querySelector(sel); | |
| Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value') | |
| .set.call(el, val); | |
| el.dispatchEvent(new Event('input', {bubbles: true})); | |
| el.dispatchEvent(new Event('change', {bubbles: true})); | |
| } | |
| ``` | |
| Every action in PaperProf β generate question, submit answer, new MCQ β is a timestamp written into a hidden textbox, picked up by a `.change()` listener on the Python side. Buttons that aren't buttons. | |
| **MutationObserver loses to Svelte.** Gradio's reactive DOM updates don't always fire observers the way you'd expect. We surrendered and switched to a humble `setInterval` polling loop. Less elegant, infinitely more reliable. Sometimes the dumb solution is the senior solution. | |
| ### Lesson 3 β ZeroGPU makes you think in seconds | |
| ZeroGPU gives you a serious GPU for free, but only in short decorated windows. That budget reshapes your architecture: | |
| - **First-call cold starts are real.** Loading an 8B model takes ~60β90s the first time. We built the UI to be honest about it: a live elapsed-time counter, escalating messages ("Model loadingβ¦", "Still loadingβ¦ first call can take ~90s"), and a 3-minute hard timeout that unlocks the UI instead of spinning forever. | |
| - **Never download inside the GPU window.** FLUX.2-klein weighs ~16 GB. We prefetch it in a daemon thread at *startup*, so the `@spaces.GPU` window is spent generating, not downloading. We even skip a 7.75 GB duplicate ComfyUI checkpoint in the repo that diffusers never reads. | |
| - **Don't burn GPU time on things JavaScript can do.** MCQ grading needs no model call β the LLM emits a structured format once (`QUESTION:` / `A)`β`D)` / `CORRECT:` / `EXPLAIN_A:`β¦), we parse it into JSON, and the browser grades clicks instantly. Zero latency, zero GPU seconds. | |
| ### Lesson 4 β The bug that fired twice | |
| Late in the hackathon, our session-summary modal showed every MCQ answer **duplicated**: answer one question, see it counted twice, score 0/2. | |
| The cause was textbook event-handling: MCQ buttons had `btn.onclick = handler` assigned in the display function *and* an `addEventListener` registered by the global wiring function. One click, two handlers, two score increments. Our first fix removed the wrong one β the `addEventListener` had a timing flaw with its idempotency guard, so clicks then did *nothing at all*. The final fix kept the `onclick` (reassigned fresh with each question, inherently idempotent) and added a `mcqAnswered` re-entrancy guard for belt-and-suspenders. | |
| Moral: when two pieces of code both "helpfully" wire the same button, you don't have redundancy β you have a race. | |
| ### Lesson 5 β Prompts are product decisions | |
| Small prompt details made the difference between "tech demo" and "usable study tool": | |
| - Early questions were rambling multi-part monsters. The fix was brutal constraint: *"ONE question only, on ONE concept. Maximum 25 words. No sub-questions, no 'and'."* | |
| - The evaluator follows a fixed 4-part structure (Verdict / What was good / What was missing / Model answer) so the frontend can parse and render it as styled sections β prompt format *is* API contract. | |
| - With French source PDFs, the model kept drifting into French. Polite instructions lost to the gravitational pull of the context. What finally worked: `IMPORTANT: Always write in English, even if the source text is in another language` β stated twice, once at the top and once at the bottom of the prompt. With 8B models, subtlety is wasted; repetition is a feature. | |
| --- | |
| ## What We'd Tell Past Us | |
| 1. **Read the git log of your own project sometimes.** Two-thirds `fix:` commits isn't failure β it's the actual texture of shipping. Each one was a lesson nobody had written down for us. | |
| 2. **Frameworks fight back hardest at the edges.** Using Gradio normally is easy. Using it as an invisible backend required understanding how it *actually* renders. The weird workarounds (`height:0`, textbox triggers, polling) are now reusable knowledge. | |
| 3. **Free infrastructure imposes honest engineering.** No API credits to hide behind means caring about cold starts, GPU seconds, and weight prefetching. Constraints made the architecture better. | |
| 4. **Client-side everything you can.** The MCQ mode is the snappiest feature in the app precisely because it never touches the server after generation. | |
| 5. **Ship the small thing.** PaperProf does one loop β read, ask, grade, encourage β and does it end-to-end. A hackathon project that completes one circle beats one that sketches five. | |
| --- | |
| ## The Stack | |
| | Layer | Choice | | |
| |---|---| | |
| | Q&A + evaluation | MiniCPM4.1-8B (openbmb), QLoRA fine-tune, bfloat16, transformers 4.57.1 | | |
| | Session images | FLUX.2-klein-4B (Black Forest Labs), diffusers | | |
| | PDF parsing | PyMuPDF | | |
| | Backend / hosting | Gradio 6 on Hugging Face Spaces, ZeroGPU | | |
| | Frontend | Hand-written HTML/CSS/JS over a hidden-Gradio bridge | | |
| | External APIs | **None.** π Fully off the grid. | | |
| --- | |
| *Built for the Build Small Hackathon, June 2026. The Space is live β bring a PDF and let the professor grill you: [huggingface.co/spaces/build-small-hackathon/PaperProf](https://huggingface.co/spaces/build-small-hackathon/PaperProf)* | |