PaperProf / README.md
MDIIII's picture
Update README.md
1306d57 verified
|
Raw
History Blame Contribute Delete
4.61 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: PaperProf
emoji: πŸ“„
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
python_version: '3.12'
app_file: app.py
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:offgrid
  - achievement:welltuned
  - achievement:offbrand
  - achievement:llama
  - achievement:sharing
  - achievement:fieldnotes

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


PaperProf β€” AI Study Buddy

Demo

Video walkthrough: https://youtu.be/eyoXrGMjXWc

LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/

Models used

Sponsor prize categories

  • OpenBMB (MiniCPM4.1-8B)
  • Black Forest Labs (FLUX.2-klein-4B)

PaperProf turns any course PDF into an interactive study session. Upload your lecture notes or textbook, receive auto-generated questions drawn directly from the material, type your answers, and get instant, constructive feedback powered by a local LLM (MiniCPM4-8B).


How it works

PDF upload
    └─► core/parser.py      β€” extract raw text with PyMuPDF
         └─► core/chunker.py β€” split text into thematic chunks
              └─► core/questioner.py β€” LLM generates a question from a chunk
                   └─► student answers
                        └─► core/evaluator.py β€” LLM evaluates & explains

The LLM (loaded once at startup via model/llm.py) handles both question generation and answer evaluation. Everything runs locally β€” no API keys needed.


File structure

PaperProf/
β”œβ”€β”€ app.py                  # Gradio UI β€” entry point
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ parser.py           # PDF β†’ plain text  (PyMuPDF)
β”‚   β”œβ”€β”€ chunker.py          # plain text β†’ thematic chunks
β”‚   β”œβ”€β”€ questioner.py       # chunk β†’ study question  (LLM)
β”‚   └── evaluator.py        # (question, chunk, answer) β†’ feedback  (LLM)
└── model/
    β”œβ”€β”€ __init__.py
    └── llm.py              # singleton LLM wrapper  (MiniCPM4-8B / Transformers)

File roles

File Role
app.py Builds the Gradio interface and wires the pipeline together.
core/parser.py Opens the PDF with PyMuPDF (fitz) and extracts plain text page by page.
core/chunker.py Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded.
core/questioner.py Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question.
core/evaluator.py Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer.
model/llm.py Loads openbmb/MiniCPM4-8B once via Transformers, exposes a generate(prompt) method, and caches the instance as a singleton.
requirements.txt Pins all Python dependencies needed to run the project.

Setup

# 1. Create a virtual environment
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. (Optional) override the model or device
export PAPERPROF_MODEL="openbmb/MiniCPM3-4B"   # smaller model for testing
export PAPERPROF_DEVICE="cuda"                  # cuda | mps | cpu | auto

# 4. Launch
python app.py

The Gradio app will open at http://localhost:7860.


Usage

  1. Click Upload course PDF and choose your file.
  2. Click Load PDF β€” PaperProf parses the document and reports how many chunks were found.
  3. Click New Question to get a question generated from a random chunk.
  4. Type your answer in the Your Answer box.
  5. Click Submit Answer to receive structured feedback.

Repeat steps 3–5 as many times as you like to practice the full material.


Requirements

  • Python β‰₯ 3.10
  • A GPU with β‰₯ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16. CPU inference works but is slow; set PAPERPROF_MODEL to a 4B variant for faster CPU runs.