Spaces:

build-small-hackathon
/

PaperProf

Running on Zero

App Files Files Community

PaperProf / README.md

MDIIII

Update README.md

1306d57 verified 12 days ago

preview code

Raw

History Blame Contribute Delete

4.61 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: PaperProf
emoji: 📄
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
python_version: '3.12'
app_file: app.py
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:offgrid
  - achievement:welltuned
  - achievement:offbrand
  - achievement:llama
  - achievement:sharing
  - achievement:fieldnotes

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

PaperProf — AI Study Buddy

Demo

Video walkthrough: https://youtu.be/eyoXrGMjXWc

LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/

Models used

build-small-hackathon/MiniCPM4-8B-PaperProf — QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation
black-forest-labs/FLUX.1-schnell — FLUX.2-klein-4B, used for session image generation

Sponsor prize categories

OpenBMB (MiniCPM4.1-8B)
Black Forest Labs (FLUX.2-klein-4B)

PaperProf turns any course PDF into an interactive study session. Upload your lecture notes or textbook, receive auto-generated questions drawn directly from the material, type your answers, and get instant, constructive feedback powered by a local LLM (MiniCPM4-8B).

How it works

PDF upload
    └─► core/parser.py      — extract raw text with PyMuPDF
         └─► core/chunker.py — split text into thematic chunks
              └─► core/questioner.py — LLM generates a question from a chunk
                   └─► student answers
                        └─► core/evaluator.py — LLM evaluates & explains

The LLM (loaded once at startup via model/llm.py) handles both question generation and answer evaluation. Everything runs locally — no API keys needed.

File structure

PaperProf/
├── app.py                  # Gradio UI — entry point
├── requirements.txt        # Python dependencies
├── README.md               # This file
├── core/
│   ├── __init__.py
│   ├── parser.py           # PDF → plain text  (PyMuPDF)
│   ├── chunker.py          # plain text → thematic chunks
│   ├── questioner.py       # chunk → study question  (LLM)
│   └── evaluator.py        # (question, chunk, answer) → feedback  (LLM)
└── model/
    ├── __init__.py
    └── llm.py              # singleton LLM wrapper  (MiniCPM4-8B / Transformers)

File roles

File	Role
`app.py`	Builds the Gradio interface and wires the pipeline together.
`core/parser.py`	Opens the PDF with PyMuPDF (`fitz`) and extracts plain text page by page.
`core/chunker.py`	Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded.
`core/questioner.py`	Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question.
`core/evaluator.py`	Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer.
`model/llm.py`	Loads `openbmb/MiniCPM4-8B` once via Transformers, exposes a `generate(prompt)` method, and caches the instance as a singleton.
`requirements.txt`	Pins all Python dependencies needed to run the project.

Setup

# 1. Create a virtual environment
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. (Optional) override the model or device
export PAPERPROF_MODEL="openbmb/MiniCPM3-4B"   # smaller model for testing
export PAPERPROF_DEVICE="cuda"                  # cuda | mps | cpu | auto

# 4. Launch
python app.py

The Gradio app will open at http://localhost:7860.

Usage

Click Upload course PDF and choose your file.
Click Load PDF — PaperProf parses the document and reports how many chunks were found.
Click New Question to get a question generated from a random chunk.
Type your answer in the Your Answer box.
Click Submit Answer to receive structured feedback.

Repeat steps 3–5 as many times as you like to practice the full material.

Requirements

Python ≥ 3.10
A GPU with ≥ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16. CPU inference works but is slow; set PAPERPROF_MODEL to a 4B variant for faster CPU runs.