File size: 4,613 Bytes
268853c
 
22e60a6
268853c
 
97e5b2d
 
 
 
268853c
c02f059
 
 
 
 
 
 
 
 
268853c
 
 
e1c0b77
 
 
 
 
97c5c86
 
 
 
 
 
34dca80
 
 
 
 
 
 
1306d57
34dca80
 
e1c0b77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: PaperProf
emoji: πŸ“„
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.16.0
python_version: '3.12'
app_file: app.py
pinned: false
tags:
  - track:backyard
  - sponsor:openbmb
  - achievement:offgrid
  - achievement:welltuned
  - achievement:offbrand
  - achievement:llama
  - achievement:sharing
  - achievement:fieldnotes
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

---

# PaperProf β€” AI Study Buddy

## Demo

Video walkthrough: https://youtu.be/eyoXrGMjXWc

LinkedIn post: https://www.linkedin.com/posts/ryad-gazenay_buildsmallhackathon-huggingface-gradio-ugcPost-7471900513991729152-Th-Y/

## Models used

- [build-small-hackathon/MiniCPM4-8B-PaperProf](https://huggingface.co/build-small-hackathon/MiniCPM4-8B-PaperProf) β€” QLoRA fine-tune of openbmb/MiniCPM4-8B on SQuAD, used for question generation and answer evaluation
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) β€” FLUX.2-klein-4B, used for session image generation

## Sponsor prize categories

- OpenBMB (MiniCPM4.1-8B)
- Black Forest Labs (FLUX.2-klein-4B)

PaperProf turns any course PDF into an interactive study session.
Upload your lecture notes or textbook, receive auto-generated questions drawn
directly from the material, type your answers, and get instant, constructive
feedback powered by a local LLM (MiniCPM4-8B).

---

## How it works

```
PDF upload
    └─► core/parser.py      β€” extract raw text with PyMuPDF
         └─► core/chunker.py β€” split text into thematic chunks
              └─► core/questioner.py β€” LLM generates a question from a chunk
                   └─► student answers
                        └─► core/evaluator.py β€” LLM evaluates & explains
```

The LLM (loaded once at startup via `model/llm.py`) handles both question
generation and answer evaluation.  Everything runs locally β€” no API keys needed.

---

## File structure

```
PaperProf/
β”œβ”€β”€ app.py                  # Gradio UI β€” entry point
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ parser.py           # PDF β†’ plain text  (PyMuPDF)
β”‚   β”œβ”€β”€ chunker.py          # plain text β†’ thematic chunks
β”‚   β”œβ”€β”€ questioner.py       # chunk β†’ study question  (LLM)
β”‚   └── evaluator.py        # (question, chunk, answer) β†’ feedback  (LLM)
└── model/
    β”œβ”€β”€ __init__.py
    └── llm.py              # singleton LLM wrapper  (MiniCPM4-8B / Transformers)
```

### File roles

| File | Role |
|---|---|
| `app.py` | Builds the Gradio interface and wires the pipeline together. |
| `core/parser.py` | Opens the PDF with PyMuPDF (`fitz`) and extracts plain text page by page. |
| `core/chunker.py` | Splits the raw text on paragraph boundaries, merging short paragraphs and capping chunk size so the LLM isn't overloaded. |
| `core/questioner.py` | Sends a chunk to the LLM with a professor-style prompt and returns one open-ended question. |
| `core/evaluator.py` | Sends the question, source chunk, and student answer to the LLM, which returns a structured verdict + model answer. |
| `model/llm.py` | Loads `openbmb/MiniCPM4-8B` once via Transformers, exposes a `generate(prompt)` method, and caches the instance as a singleton. |
| `requirements.txt` | Pins all Python dependencies needed to run the project. |

---

## Setup

```bash
# 1. Create a virtual environment
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. (Optional) override the model or device
export PAPERPROF_MODEL="openbmb/MiniCPM3-4B"   # smaller model for testing
export PAPERPROF_DEVICE="cuda"                  # cuda | mps | cpu | auto

# 4. Launch
python app.py
```

The Gradio app will open at `http://localhost:7860`.

---

## Usage

1. Click **Upload course PDF** and choose your file.
2. Click **Load PDF** β€” PaperProf parses the document and reports how many
   chunks were found.
3. Click **New Question** to get a question generated from a random chunk.
4. Type your answer in the **Your Answer** box.
5. Click **Submit Answer** to receive structured feedback.

Repeat steps 3–5 as many times as you like to practice the full material.

---

## Requirements

- Python β‰₯ 3.10
- A GPU with β‰₯ 10 GB VRAM is recommended for MiniCPM4-8B in bfloat16.
  CPU inference works but is slow; set `PAPERPROF_MODEL` to a 4B variant for
  faster CPU runs.