PaperProf / README.md
MDIIII's picture
Update README.md
1306d57 verified
|
Raw
History Blame Contribute Delete
4.61 kB
---
title: PaperProf
emoji: πŸ“„
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
python_version: '3.12'
app_file: app.py
pinned: false
tags:
- track:backyard
- sponsor:openbmb
- achievement:offgrid
- achievement:welltuned
- achievement:offbrand
- achievement:llama
- achievement:sharing
- achievement:fieldnotes
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
---
# PaperProf β€” AI Study Buddy
## Demo
Video walkthrough: https://youtu.be/eyoXrGMjXWc
LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/
## Models used
- [build-small-hackathon/MiniCPM4-8B-PaperProf](https://huggingface.co/build-small-hackathon/MiniCPM4-8B-PaperProf) β€” QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) β€” FLUX.2-klein-4B, used for session image generation
## Sponsor prize categories
- OpenBMB (MiniCPM4.1-8B)
- Black Forest Labs (FLUX.2-klein-4B)
PaperProf turns any course PDF into an interactive study session.
Upload your lecture notes or textbook, receive auto-generated questions drawn
directly from the material, type your answers, and get instant, constructive
feedback powered by a local LLM (MiniCPM4-8B).
---
## How it works
```
PDF upload
└─► core/parser.py β€” extract raw text with PyMuPDF
└─► core/chunker.py β€” split text into thematic chunks
└─► core/questioner.py β€” LLM generates a question from a chunk
└─► student answers
└─► core/evaluator.py β€” LLM evaluates & explains
```
The LLM (loaded once at startup via `model/llm.py`) handles both question
generation and answer evaluation. Everything runs locally β€” no API keys needed.
---
## File structure
```
PaperProf/
β”œβ”€β”€ app.py # Gradio UI β€” entry point
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ core/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ parser.py # PDF β†’ plain text (PyMuPDF)
β”‚ β”œβ”€β”€ chunker.py # plain text β†’ thematic chunks
β”‚ β”œβ”€β”€ questioner.py # chunk β†’ study question (LLM)
β”‚ └── evaluator.py # (question, chunk, answer) β†’ feedback (LLM)
└── model/
β”œβ”€β”€ __init__.py
└── llm.py # singleton LLM wrapper (MiniCPM4-8B / Transformers)
```
### File roles
| File | Role |
|---|---|
| `app.py` | Builds the Gradio interface and wires the pipeline together. |
| `core/parser.py` | Opens the PDF with PyMuPDF (`fitz`) and extracts plain text page by page. |
| `core/chunker.py` | Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded. |
| `core/questioner.py` | Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question. |
| `core/evaluator.py` | Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer. |
| `model/llm.py` | Loads `openbmb/MiniCPM4-8B` once via Transformers, exposes a `generate(prompt)` method, and caches the instance as a singleton. |
| `requirements.txt` | Pins all Python dependencies needed to run the project. |
---
## Setup
```bash
# 1. Create a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. (Optional) override the model or device
export PAPERPROF_MODEL="openbmb/MiniCPM3-4B" # smaller model for testing
export PAPERPROF_DEVICE="cuda" # cuda | mps | cpu | auto
# 4. Launch
python app.py
```
The Gradio app will open at `http://localhost:7860`.
---
## Usage
1. Click **Upload course PDF** and choose your file.
2. Click **Load PDF** β€” PaperProf parses the document and reports how many
chunks were found.
3. Click **New Question** to get a question generated from a random chunk.
4. Type your answer in the **Your Answer** box.
5. Click **Submit Answer** to receive structured feedback.
Repeat steps 3–5 as many times as you like to practice the full material.
---
## Requirements
- Python β‰₯ 3.10
- A GPU with β‰₯ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16.
CPU inference works but is slow; set `PAPERPROF_MODEL` to a 4B variant for
faster CPU runs.