Spaces:
Sleeping
Sleeping
File size: 4,613 Bytes
268853c 22e60a6 268853c 97e5b2d 268853c c02f059 268853c e1c0b77 97c5c86 34dca80 1306d57 34dca80 e1c0b77 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | ---
title: PaperProf
emoji: π
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
python_version: '3.12'
app_file: app.py
pinned: false
tags:
- track:backyard
- sponsor:openbmb
- achievement:offgrid
- achievement:welltuned
- achievement:offbrand
- achievement:llama
- achievement:sharing
- achievement:fieldnotes
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
---
# PaperProf β AI Study Buddy
## Demo
Video walkthrough: https://youtu.be/eyoXrGMjXWc
LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/
## Models used
- [build-small-hackathon/MiniCPM4-8B-PaperProf](https://huggingface.co/build-small-hackathon/MiniCPM4-8B-PaperProf) β QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) β FLUX.2-klein-4B, used for session image generation
## Sponsor prize categories
- OpenBMB (MiniCPM4.1-8B)
- Black Forest Labs (FLUX.2-klein-4B)
PaperProf turns any course PDF into an interactive study session.
Upload your lecture notes or textbook, receive auto-generated questions drawn
directly from the material, type your answers, and get instant, constructive
feedback powered by a local LLM (MiniCPM4-8B).
---
## How it works
```
PDF upload
βββΊ core/parser.py β extract raw text with PyMuPDF
βββΊ core/chunker.py β split text into thematic chunks
βββΊ core/questioner.py β LLM generates a question from a chunk
βββΊ student answers
βββΊ core/evaluator.py β LLM evaluates & explains
```
The LLM (loaded once at startup via `model/llm.py`) handles both question
generation and answer evaluation. Everything runs locally β no API keys needed.
---
## File structure
```
PaperProf/
βββ app.py # Gradio UI β entry point
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ core/
β βββ __init__.py
β βββ parser.py # PDF β plain text (PyMuPDF)
β βββ chunker.py # plain text β thematic chunks
β βββ questioner.py # chunk β study question (LLM)
β βββ evaluator.py # (question, chunk, answer) β feedback (LLM)
βββ model/
βββ __init__.py
βββ llm.py # singleton LLM wrapper (MiniCPM4-8B / Transformers)
```
### File roles
| File | Role |
|---|---|
| `app.py` | Builds the Gradio interface and wires the pipeline together. |
| `core/parser.py` | Opens the PDF with PyMuPDF (`fitz`) and extracts plain text page by page. |
| `core/chunker.py` | Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded. |
| `core/questioner.py` | Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question. |
| `core/evaluator.py` | Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer. |
| `model/llm.py` | Loads `openbmb/MiniCPM4-8B` once via Transformers, exposes a `generate(prompt)` method, and caches the instance as a singleton. |
| `requirements.txt` | Pins all Python dependencies needed to run the project. |
---
## Setup
```bash
# 1. Create a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. (Optional) override the model or device
export PAPERPROF_MODEL="openbmb/MiniCPM3-4B" # smaller model for testing
export PAPERPROF_DEVICE="cuda" # cuda | mps | cpu | auto
# 4. Launch
python app.py
```
The Gradio app will open at `http://localhost:7860`.
---
## Usage
1. Click **Upload course PDF** and choose your file.
2. Click **Load PDF** β PaperProf parses the document and reports how many
chunks were found.
3. Click **New Question** to get a question generated from a random chunk.
4. Type your answer in the **Your Answer** box.
5. Click **Submit Answer** to receive structured feedback.
Repeat steps 3β5 as many times as you like to practice the full material.
---
## Requirements
- Python β₯ 3.10
- A GPU with β₯ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16.
CPU inference works but is slow; set `PAPERPROF_MODEL` to a 4B variant for
faster CPU runs.
|