Spaces:

build-small-hackathon
/

PaperProf

Sleeping

File size: 4,613 Bytes

---
title: PaperProf
emoji: 📄
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
python_version: '3.12'
app_file: app.py
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:offgrid
  - achievement:welltuned
  - achievement:offbrand
  - achievement:llama
  - achievement:sharing
  - achievement:fieldnotes
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

---

# PaperProf — AI Study Buddy

## Demo

Video walkthrough: https://youtu.be/eyoXrGMjXWc

LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/

## Models used

- [build-small-hackathon/MiniCPM4-8B-PaperProf](https://huggingface.co/build-small-hackathon/MiniCPM4-8B-PaperProf) — QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) — FLUX.2-klein-4B, used for session image generation

## Sponsor prize categories

- OpenBMB (MiniCPM4.1-8B)
- Black Forest Labs (FLUX.2-klein-4B)

PaperProf turns any course PDF into an interactive study session.
Upload your lecture notes or textbook, receive auto-generated questions drawn
directly from the material, type your answers, and get instant, constructive
feedback powered by a local LLM (MiniCPM4-8B).

---

## How it works

```
PDF upload
    └─► core/parser.py      — extract raw text with PyMuPDF
         └─► core/chunker.py — split text into thematic chunks
              └─► core/questioner.py — LLM generates a question from a chunk
                   └─► student answers
                        └─► core/evaluator.py — LLM evaluates & explains
```

The LLM (loaded once at startup via `model/llm.py`) handles both question
generation and answer evaluation.  Everything runs locally — no API keys needed.

---

## File structure

```
PaperProf/
├── app.py                  # Gradio UI — entry point
├── requirements.txt        # Python dependencies
├── README.md               # This file
├── core/
│   ├── __init__.py
│   ├── parser.py           # PDF → plain text  (PyMuPDF)
│   ├── chunker.py          # plain text → thematic chunks
│   ├── questioner.py       # chunk → study question  (LLM)
│   └── evaluator.py        # (question, chunk, answer) → feedback  (LLM)
└── model/
    ├── __init__.py
    └── llm.py              # singleton LLM wrapper  (MiniCPM4-8B / Transformers)
```

### File roles

| File | Role |
|---|---|
| `app.py` | Builds the Gradio interface and wires the pipeline together. |
| `core/parser.py` | Opens the PDF with PyMuPDF (`fitz`) and extracts plain text page by page. |
| `core/chunker.py` | Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded. |
| `core/questioner.py` | Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question. |
| `core/evaluator.py` | Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer. |
| `model/llm.py` | Loads `openbmb/MiniCPM4-8B` once via Transformers, exposes a `generate(prompt)` method, and caches the instance as a singleton. |
| `requirements.txt` | Pins all Python dependencies needed to run the project. |

---

## Setup

```bash
# 1. Create a virtual environment
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. (Optional) override the model or device
export PAPERPROF_MODEL="openbmb/MiniCPM3-4B"   # smaller model for testing
export PAPERPROF_DEVICE="cuda"                  # cuda | mps | cpu | auto

# 4. Launch
python app.py
```

The Gradio app will open at `http://localhost:7860`.

---

## Usage

1. Click **Upload course PDF** and choose your file.
2. Click **Load PDF** — PaperProf parses the document and reports how many
   chunks were found.
3. Click **New Question** to get a question generated from a random chunk.
4. Type your answer in the **Your Answer** box.
5. Click **Submit Answer** to receive structured feedback.

Repeat steps 3–5 as many times as you like to practice the full material.

---

## Requirements

- Python ≥ 3.10
- A GPU with ≥ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16.
  CPU inference works but is slow; set `PAPERPROF_MODEL` to a 4B variant for
  faster CPU runs.