# πŸ“„ PaperProf: How We Fought Gradio, Won, and Built an AI Study Buddy in 10 Days *Field notes from the Build Small Hackathon β€” June 5–15, 2026* --- ## The Pitch Every student knows the ritual: it's 11 PM, the exam is tomorrow, and you're re-reading the same lecture PDF for the fourth time, *feeling* productive while learning absolutely nothing. Passive re-reading is one of the worst-performing study techniques in the learning-science literature. Active recall β€” forcing yourself to answer questions β€” is one of the best. So we built **PaperProf**: drop in any course PDF, and it becomes your personal professor. It reads the material, generates exam-style questions from it, grades your answers like a patient tutor, and even paints you a parting gift when you finish your session. **[Try it live on Hugging Face Spaces β†’](https://huggingface.co/spaces/build-small-hackathon/PaperProf)** Everything runs on free infrastructure with zero external API calls. No OpenAI key, no rate limits, no data leaving the machine. Just open-weight models doing honest work on a ZeroGPU slice. --- ## What It Does 1. **Upload a PDF** β€” lecture notes, a textbook chapter, slides, whatever you're cramming. 2. **PaperProf chunks it** into thematic sections and picks one at random. 3. **Choose your mode:** - **Open questions** β€” write a free-form answer, get structured tutor feedback: a verdict, what you got right, what you missed, and a model answer. - **MCQ** β€” four plausible options, instant client-side grading, and a one-sentence explanation for *every* choice, not just the right one. 4. **A score ring** tracks your session in real time. 5. **End the session** and FLUX.2-klein generates a unique image inspired by the topics you just studied β€” a small visual reward for showing up. The whole question-answer-feedback loop runs on **MiniCPM4.1-8B**, our QLoRA fine-tune of openbmb's latest 8B model, loaded once and shared between question generation and answer evaluation. ``` PDF upload └─► parser.py β€” PyMuPDF text extraction └─► chunker.py β€” thematic chunking (min/max word caps) └─► questioner.py β€” MiniCPM4-8B writes ONE focused question └─► you answer └─► evaluator.py β€” the same model grades you like a tutor └─► image_gen.py β€” FLUX.2-klein paints your session ``` --- ## Badges Earned β€” 6 / 6 Build Small Hackathon awards merit badges for specific technical achievements. Here's where we landed: | Badge | Status | What it took | |---|---|---| | **Off the Grid** | βœ… Earned | Zero external APIs β€” everything runs via ZeroGPU, no OpenAI key, no rate limits, no data leaving the machine | | **Well-Tuned** | βœ… Earned | QLoRA fine-tune on SQuAD, model published at `build-small-hackathon/MiniCPM4-8B-PaperProf` | | **Off-Brand** | βœ… Earned | Hand-built HTML/CSS/JS β€” Gradio is invisible, the entire UI is 100% custom (see Lesson 2 below) | | **Llama Champion** | βœ… Earned | GGUF published at `build-small-hackathon/MiniCPM4-8B-PaperProf-GGUF`, llama.cpp CPU runtime wired in via `PAPERPROF_RUNTIME=llamacpp` | | **Field Notes** | βœ… Earned | This post, plus the interactive `blog/index.html` site | | **Sharing is Caring** | βœ… Earned | 12 LLM steps across 3 live sessions published as dataset `build-small-hackathon/PaperProf-traces` | --- ## The Real Story: 68 Commits of Lessons A hackathon README tells you what was built. The git log tells you what actually happened. Ours has 101 commits, and roughly two-thirds of them start with `fix:`. Here is the honest version. ### Lesson 1 β€” Model choice is a compatibility problem, not a benchmark problem We started with MiniCPM3-4B, upgraded to MiniCPM4-8B for better reasoning, and immediately hit the classic open-model trap: the model card says one thing, the `transformers` version on your machine says another. ``` fix: pin transformers==4.57.1 for MiniCPM4-8B compatibility ``` One pinned version later, everything worked. The follow-up lesson came from quantization: bitsandbytes 4-bit is great on a 16 GB local GPU and *completely unnecessary* on ZeroGPU's hardware β€” so we made it conditional: ```python # HF Spaces (ZeroGPU): skip quantization, use bfloat16 directly if os.environ.get("SPACE_ID"): return None # Locally: 4-bit when VRAM < 17 GB ``` Same code, two deployment targets, zero config files. Detect the environment, adapt. ### Lesson 2 β€” The Off-Brand badge nearly broke us (and taught us the most) The hackathon has an **Off-Brand** badge: ship a UI that doesn't look like the framework you built it with. We wanted PaperProf to look like a real product β€” glassmorphism, animated score ring, dark academia palette β€” not a Gradio demo. Attempt #1: restyle Gradio with CSS. We fought the theme system through *eleven consecutive commits* (`fix: CSS labels illisibles`, `fix: override variables CSS Gradio`, `fix: retire primary_hue orange qui changeait toutes les teintes`...). Gradio's theming always had one more `!important` than we did. Attempt #2: nuke it from orbit. Docker SDK, FastAPI serving raw HTML, Gradio relegated to a backend. It worked locally and died on Spaces β€” we lost ZeroGPU integration, which only flows through the Gradio SDK. Attempt #3, the one that shipped: **the hidden-component bridge**. Keep Gradio as an invisible backend *inside the page*. Serve a fully custom HTML/CSS/JS interface through `gr.HTML`, hide every real Gradio component off-screen, and let a 300ms JavaScript polling loop ferry data between the two worlds. This pattern produced the three hardest-won discoveries of the hackathon: **`display: none` silently kills Gradio.** Components hidden that way never get their Svelte event handlers attached. The fix is the oldest trick in CSS: ```css /* collapsed but NOT display:none, so Gradio attaches event handlers */ #hidden-row-question { height: 0 !important; overflow: visible !important; } ``` **You can't `.click()` a Gradio button from JS.** Server-side rendering means the synthetic click goes nowhere. What *does* work: programmatically setting a hidden textbox's value through the native property descriptor, then dispatching `input`/`change` events so Svelte notices: ```javascript function setGradioTA(sel, val) { const el = document.querySelector(sel); Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value') .set.call(el, val); el.dispatchEvent(new Event('input', {bubbles: true})); el.dispatchEvent(new Event('change', {bubbles: true})); } ``` Every action in PaperProf β€” generate question, submit answer, new MCQ β€” is a timestamp written into a hidden textbox, picked up by a `.change()` listener on the Python side. Buttons that aren't buttons. **MutationObserver loses to Svelte.** Gradio's reactive DOM updates don't always fire observers the way you'd expect. We surrendered and switched to a humble `setInterval` polling loop. Less elegant, infinitely more reliable. Sometimes the dumb solution is the senior solution. ### Lesson 3 β€” ZeroGPU makes you think in seconds ZeroGPU gives you a serious GPU for free, but only in short decorated windows. That budget reshapes your architecture: - **First-call cold starts are real.** Loading an 8B model takes ~60–90s the first time. We built the UI to be honest about it: a live elapsed-time counter, escalating messages ("Model loading…", "Still loading… first call can take ~90s"), and a 3-minute hard timeout that unlocks the UI instead of spinning forever. - **Never download inside the GPU window.** FLUX.2-klein weighs ~16 GB. We prefetch it in a daemon thread at *startup*, so the `@spaces.GPU` window is spent generating, not downloading. We even skip a 7.75 GB duplicate ComfyUI checkpoint in the repo that diffusers never reads. - **Don't burn GPU time on things JavaScript can do.** MCQ grading needs no model call β€” the LLM emits a structured format once (`QUESTION:` / `A)`–`D)` / `CORRECT:` / `EXPLAIN_A:`…), we parse it into JSON, and the browser grades clicks instantly. Zero latency, zero GPU seconds. ### Lesson 4 β€” The bug that fired twice Late in the hackathon, our session-summary modal showed every MCQ answer **duplicated**: answer one question, see it counted twice, score 0/2. The cause was textbook event-handling: MCQ buttons had `btn.onclick = handler` assigned in the display function *and* an `addEventListener` registered by the global wiring function. One click, two handlers, two score increments. Our first fix removed the wrong one β€” the `addEventListener` had a timing flaw with its idempotency guard, so clicks then did *nothing at all*. The final fix kept the `onclick` (reassigned fresh with each question, inherently idempotent) and added a `mcqAnswered` re-entrancy guard for belt-and-suspenders. Moral: when two pieces of code both "helpfully" wire the same button, you don't have redundancy β€” you have a race. ### Lesson 5 β€” Prompts are product decisions Small prompt details made the difference between "tech demo" and "usable study tool": - Early questions were rambling multi-part monsters. The fix was brutal constraint: *"ONE question only, on ONE concept. Maximum 25 words. No sub-questions, no 'and'."* - The evaluator follows a fixed 4-part structure (Verdict / What was good / What was missing / Model answer) so the frontend can parse and render it as styled sections β€” prompt format *is* API contract. - With French source PDFs, the model kept drifting into French. Polite instructions lost to the gravitational pull of the context. What finally worked: `IMPORTANT: Always write in English, even if the source text is in another language` β€” stated twice, once at the top and once at the bottom of the prompt. With 8B models, subtlety is wasted; repetition is a feature. --- ## What We'd Tell Past Us 1. **Read the git log of your own project sometimes.** Two-thirds `fix:` commits isn't failure β€” it's the actual texture of shipping. Each one was a lesson nobody had written down for us. 2. **Frameworks fight back hardest at the edges.** Using Gradio normally is easy. Using it as an invisible backend required understanding how it *actually* renders. The weird workarounds (`height:0`, textbox triggers, polling) are now reusable knowledge. 3. **Free infrastructure imposes honest engineering.** No API credits to hide behind means caring about cold starts, GPU seconds, and weight prefetching. Constraints made the architecture better. 4. **Client-side everything you can.** The MCQ mode is the snappiest feature in the app precisely because it never touches the server after generation. 5. **Ship the small thing.** PaperProf does one loop β€” read, ask, grade, encourage β€” and does it end-to-end. A hackathon project that completes one circle beats one that sketches five. --- ## The Stack | Layer | Choice | |---|---| | Q&A + evaluation | MiniCPM4.1-8B (openbmb), QLoRA fine-tune, bfloat16, transformers 4.57.1 | | Session images | FLUX.2-klein-4B (Black Forest Labs), diffusers | | PDF parsing | PyMuPDF | | Backend / hosting | Gradio 6 on Hugging Face Spaces, ZeroGPU | | Frontend | Hand-written HTML/CSS/JS over a hidden-Gradio bridge | | External APIs | **None.** πŸ”Œ Fully off the grid. | --- *Built for the Build Small Hackathon, June 2026. The Space is live β€” bring a PDF and let the professor grill you: [huggingface.co/spaces/build-small-hackathon/PaperProf](https://huggingface.co/spaces/build-small-hackathon/PaperProf)*