Spaces:

build-small-hackathon
/

PaperProf

Running on Zero

App Files Files Community

PaperProf / README.md

MDIIII

Update README.md

1306d57 verified 12 days ago

preview code

Raw

History Blame Contribute Delete

4.61 kB

	---
	title: PaperProf
	emoji: 📄
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: 6.16.0
	python_version: '3.12'
	app_file: app.py
	pinned: false
	tags:
	- track:backyard
	- sponsor:openbmb
	- achievement:offgrid
	- achievement:welltuned
	- achievement:offbrand
	- achievement:llama
	- achievement:sharing
	- achievement:fieldnotes
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	---

	# PaperProf — AI Study Buddy

	## Demo

	Video walkthrough: https://youtu.be/eyoXrGMjXWc

	LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/

	## Models used

	- [build-small-hackathon/MiniCPM4-8B-PaperProf](https://huggingface.co/build-small-hackathon/MiniCPM4-8B-PaperProf) — QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation
	- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) — FLUX.2-klein-4B, used for session image generation

	## Sponsor prize categories

	- OpenBMB (MiniCPM4.1-8B)
	- Black Forest Labs (FLUX.2-klein-4B)

	PaperProf turns any course PDF into an interactive study session.
	Upload your lecture notes or textbook, receive auto-generated questions drawn
	directly from the material, type your answers, and get instant, constructive
	feedback powered by a local LLM (MiniCPM4-8B).

	---

	## How it works

	```
	PDF upload
	└─► core/parser.py — extract raw text with PyMuPDF
	└─► core/chunker.py — split text into thematic chunks
	└─► core/questioner.py — LLM generates a question from a chunk
	└─► student answers
	└─► core/evaluator.py — LLM evaluates & explains
	```

	The LLM (loaded once at startup via `model/llm.py`) handles both question
	generation and answer evaluation. Everything runs locally — no API keys needed.

	---

	## File structure

	```
	PaperProf/
	├── app.py # Gradio UI — entry point
	├── requirements.txt # Python dependencies
	├── README.md # This file
	├── core/
	│ ├── __init__.py
	│ ├── parser.py # PDF → plain text (PyMuPDF)
	│ ├── chunker.py # plain text → thematic chunks
	│ ├── questioner.py # chunk → study question (LLM)
	│ └── evaluator.py # (question, chunk, answer) → feedback (LLM)
	└── model/
	├── __init__.py
	└── llm.py # singleton LLM wrapper (MiniCPM4-8B / Transformers)
	```

	### File roles

	\| File \| Role \|
	\|---\|---\|
	\| `app.py` \| Builds the Gradio interface and wires the pipeline together. \|
	\| `core/parser.py` \| Opens the PDF with PyMuPDF (`fitz`) and extracts plain text page by page. \|
	\| `core/chunker.py` \| Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded. \|
	\| `core/questioner.py` \| Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question. \|
	\| `core/evaluator.py` \| Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer. \|
	\| `model/llm.py` \| Loads `openbmb/MiniCPM4-8B` once via Transformers, exposes a `generate(prompt)` method, and caches the instance as a singleton. \|
	\| `requirements.txt` \| Pins all Python dependencies needed to run the project. \|

	---

	## Setup

	```bash
	# 1. Create a virtual environment
	python -m venv venv
	source venv/bin/activate # Windows: venv\Scripts\activate

	# 2. Install dependencies
	pip install -r requirements.txt

	# 3. (Optional) override the model or device
	export PAPERPROF_MODEL="openbmb/MiniCPM3-4B" # smaller model for testing
	export PAPERPROF_DEVICE="cuda" # cuda \| mps \| cpu \| auto

	# 4. Launch
	python app.py
	```

	The Gradio app will open at `http://localhost:7860`.

	---

	## Usage

	1. Click Upload course PDF and choose your file.
	2. Click Load PDF — PaperProf parses the document and reports how many
	chunks were found.
	3. Click New Question to get a question generated from a random chunk.
	4. Type your answer in the Your Answer box.
	5. Click Submit Answer to receive structured feedback.

	Repeat steps 3–5 as many times as you like to practice the full material.

	---

	## Requirements

	- Python ≥ 3.10
	- A GPU with ≥ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16.
	CPU inference works but is slow; set `PAPERPROF_MODEL` to a 4B variant for
	faster CPU runs.