Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
title: PaperProf
emoji: π
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
python_version: '3.12'
app_file: app.py
pinned: false
tags:
- track:backyard
- sponsor:openbmb
- achievement:offgrid
- achievement:welltuned
- achievement:offbrand
- achievement:llama
- achievement:sharing
- achievement:fieldnotes
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
PaperProf β AI Study Buddy
Demo
Video walkthrough: https://youtu.be/eyoXrGMjXWc
LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/
Models used
- build-small-hackathon/MiniCPM4-8B-PaperProf β QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation
- black-forest-labs/FLUX.1-schnell β FLUX.2-klein-4B, used for session image generation
Sponsor prize categories
- OpenBMB (MiniCPM4.1-8B)
- Black Forest Labs (FLUX.2-klein-4B)
PaperProf turns any course PDF into an interactive study session. Upload your lecture notes or textbook, receive auto-generated questions drawn directly from the material, type your answers, and get instant, constructive feedback powered by a local LLM (MiniCPM4-8B).
How it works
PDF upload
βββΊ core/parser.py β extract raw text with PyMuPDF
βββΊ core/chunker.py β split text into thematic chunks
βββΊ core/questioner.py β LLM generates a question from a chunk
βββΊ student answers
βββΊ core/evaluator.py β LLM evaluates & explains
The LLM (loaded once at startup via model/llm.py) handles both question
generation and answer evaluation. Everything runs locally β no API keys needed.
File structure
PaperProf/
βββ app.py # Gradio UI β entry point
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ core/
β βββ __init__.py
β βββ parser.py # PDF β plain text (PyMuPDF)
β βββ chunker.py # plain text β thematic chunks
β βββ questioner.py # chunk β study question (LLM)
β βββ evaluator.py # (question, chunk, answer) β feedback (LLM)
βββ model/
βββ __init__.py
βββ llm.py # singleton LLM wrapper (MiniCPM4-8B / Transformers)
File roles
| File | Role |
|---|---|
app.py |
Builds the Gradio interface and wires the pipeline together. |
core/parser.py |
Opens the PDF with PyMuPDF (fitz) and extracts plain text page by page. |
core/chunker.py |
Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded. |
core/questioner.py |
Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question. |
core/evaluator.py |
Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer. |
model/llm.py |
Loads openbmb/MiniCPM4-8B once via Transformers, exposes a generate(prompt) method, and caches the instance as a singleton. |
requirements.txt |
Pins all Python dependencies needed to run the project. |
Setup
# 1. Create a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. (Optional) override the model or device
export PAPERPROF_MODEL="openbmb/MiniCPM3-4B" # smaller model for testing
export PAPERPROF_DEVICE="cuda" # cuda | mps | cpu | auto
# 4. Launch
python app.py
The Gradio app will open at http://localhost:7860.
Usage
- Click Upload course PDF and choose your file.
- Click Load PDF β PaperProf parses the document and reports how many chunks were found.
- Click New Question to get a question generated from a random chunk.
- Type your answer in the Your Answer box.
- Click Submit Answer to receive structured feedback.
Repeat steps 3β5 as many times as you like to practice the full material.
Requirements
- Python β₯ 3.10
- A GPU with β₯ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16.
CPU inference works but is slow; set
PAPERPROF_MODELto a 4B variant for faster CPU runs.