Spaces:
Running on Zero
Running on Zero
| title: PaperProf | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 6.16.0 | |
| python_version: '3.12' | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - track:backyard | |
| - sponsor:openbmb | |
| - achievement:offgrid | |
| - achievement:welltuned | |
| - achievement:offbrand | |
| - achievement:llama | |
| - achievement:sharing | |
| - achievement:fieldnotes | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| --- | |
| # PaperProf β AI Study Buddy | |
| ## Demo | |
| Video walkthrough: https://youtu.be/eyoXrGMjXWc | |
| LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/ | |
| ## Models used | |
| - [build-small-hackathon/MiniCPM4-8B-PaperProf](https://huggingface.co/build-small-hackathon/MiniCPM4-8B-PaperProf) β QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation | |
| - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) β FLUX.2-klein-4B, used for session image generation | |
| ## Sponsor prize categories | |
| - OpenBMB (MiniCPM4.1-8B) | |
| - Black Forest Labs (FLUX.2-klein-4B) | |
| PaperProf turns any course PDF into an interactive study session. | |
| Upload your lecture notes or textbook, receive auto-generated questions drawn | |
| directly from the material, type your answers, and get instant, constructive | |
| feedback powered by a local LLM (MiniCPM4-8B). | |
| --- | |
| ## How it works | |
| ``` | |
| PDF upload | |
| βββΊ core/parser.py β extract raw text with PyMuPDF | |
| βββΊ core/chunker.py β split text into thematic chunks | |
| βββΊ core/questioner.py β LLM generates a question from a chunk | |
| βββΊ student answers | |
| βββΊ core/evaluator.py β LLM evaluates & explains | |
| ``` | |
| The LLM (loaded once at startup via `model/llm.py`) handles both question | |
| generation and answer evaluation. Everything runs locally β no API keys needed. | |
| --- | |
| ## File structure | |
| ``` | |
| PaperProf/ | |
| βββ app.py # Gradio UI β entry point | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| βββ core/ | |
| β βββ __init__.py | |
| β βββ parser.py # PDF β plain text (PyMuPDF) | |
| β βββ chunker.py # plain text β thematic chunks | |
| β βββ questioner.py # chunk β study question (LLM) | |
| β βββ evaluator.py # (question, chunk, answer) β feedback (LLM) | |
| βββ model/ | |
| βββ __init__.py | |
| βββ llm.py # singleton LLM wrapper (MiniCPM4-8B / Transformers) | |
| ``` | |
| ### File roles | |
| | File | Role | | |
| |---|---| | |
| | `app.py` | Builds the Gradio interface and wires the pipeline together. | | |
| | `core/parser.py` | Opens the PDF with PyMuPDF (`fitz`) and extracts plain text page by page. | | |
| | `core/chunker.py` | Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded. | | |
| | `core/questioner.py` | Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question. | | |
| | `core/evaluator.py` | Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer. | | |
| | `model/llm.py` | Loads `openbmb/MiniCPM4-8B` once via Transformers, exposes a `generate(prompt)` method, and caches the instance as a singleton. | | |
| | `requirements.txt` | Pins all Python dependencies needed to run the project. | | |
| --- | |
| ## Setup | |
| ```bash | |
| # 1. Create a virtual environment | |
| python -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| # 2. Install dependencies | |
| pip install -r requirements.txt | |
| # 3. (Optional) override the model or device | |
| export PAPERPROF_MODEL="openbmb/MiniCPM3-4B" # smaller model for testing | |
| export PAPERPROF_DEVICE="cuda" # cuda | mps | cpu | auto | |
| # 4. Launch | |
| python app.py | |
| ``` | |
| The Gradio app will open at `http://localhost:7860`. | |
| --- | |
| ## Usage | |
| 1. Click **Upload course PDF** and choose your file. | |
| 2. Click **Load PDF** β PaperProf parses the document and reports how many | |
| chunks were found. | |
| 3. Click **New Question** to get a question generated from a random chunk. | |
| 4. Type your answer in the **Your Answer** box. | |
| 5. Click **Submit Answer** to receive structured feedback. | |
| Repeat steps 3β5 as many times as you like to practice the full material. | |
| --- | |
| ## Requirements | |
| - Python β₯ 3.10 | |
| - A GPU with β₯ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16. | |
| CPU inference works but is slow; set `PAPERPROF_MODEL` to a 4B variant for | |
| faster CPU runs. | |