# PaperProf โ€” Task List Build Small Hackathon โ€” June 5โ€“15, 2026 ## โœ… Done - Project structure created (app.py, core/, model/, ui/) - core/parser.py: PDF text extraction with PyMuPDF - core/chunker.py: paragraph-based text chunking with min/max word caps - core/questioner.py: LLM-based question generation with professor prompt - core/evaluator.py: LLM-based answer evaluation with tutor prompt - model/llm.py: MiniCPM4-8B singleton wrapper via HuggingFace Transformers (bfloat16, chat template) - app.py: full Gradio UI with PDF upload, question generation, answer evaluation - requirements.txt: all dependencies listed - README.md: project description and file structure - .gitignore: venv, models, cache excluded - ZeroGPU enabled on HuggingFace Space (free RTX Pro 6000 Blackwell) - Space live at: https://huggingface.co/spaces/build-small-hackathon/PaperProf ## ๐Ÿ”ฒ To Do ### ๐Ÿงช Testing & Bug Fixes - [ ] End-to-end test: upload real PDF โ†’ load โ†’ generate question โ†’ answer โ†’ get feedback - [x] Handle edge cases: empty PDF, scanned PDF (no text), very short PDF - [x] Handle model loading errors gracefully in the UI - [ ] Test with multiple PDF types (slides, textbook chapters, lecture notes) - [ ] Fix any ZeroGPU cold start issues (model takes time to load) ### ๐ŸŽจ UI/Design (Badge: Off-Brand) - [ ] Replace default Gradio theme with fully custom CSS using gr.Server - [ ] Add PaperProf logo and branding - [ ] Add progress indicator when model is generating - [x] Add session score tracker (X/Y correct answers) - [ ] Add difficulty selector (Easy / Normal / Hard mode) - [ ] Add language selector (French / English) - [ ] Show source chunk used for the question (collapsible) - [ ] Add end-of-session summary screen with score and weak areas - [ ] Make the UI mobile-friendly ### ๐Ÿง  ML/Model Improvements - [ ] Add quiz modes: Quick (5 questions) / Full session / Brutal mode - [ ] Implement adaptive difficulty: revisit failed concepts, skip mastered ones - [ ] Add chunk relevance scoring to pick the most important chunks first - [ ] Support multiple PDF uploads in one session - [ ] Add support for plain text (.txt) and markdown (.md) files - [ ] Cache parsed chunks in session state to avoid re-parsing on reload ### ๐Ÿ… Bonus Quests - [ ] ๐Ÿ”Œ Off the Grid: verify zero external API calls (all local/ZeroGPU) - [ ] ๐ŸŽฏ Well-Tuned: fine-tune MiniCPM on educational Q&A data using Modal credits ($250) - [ ] Find/create educational Q&A dataset on HuggingFace - [ ] Write fine-tuning script with LoRA/QLoRA - [ ] Run fine-tuning on Modal GPU - [ ] Publish fine-tuned model on HuggingFace under build-small-hackathon org - [ ] Update model/llm.py to use fine-tuned model - [ ] ๐ŸŽจ Off-Brand: fully custom Gradio frontend (see UI section above) - [ ] ๐Ÿฆ™ Llama Champion: switch inference to llama.cpp runtime - [ ] Download GGUF version of MiniCPM4-8B - [ ] Replace transformers pipeline with llama-cpp-python - [ ] Test performance vs transformers - [ ] ๐Ÿ“ก Sharing is Caring: export and share agent trace on HuggingFace Hub - [x] ๐Ÿ““ Field Notes: write blog post about what we built and learned (BLOG.md) - [ ] Document architecture decisions - [ ] Include benchmark results (speed, quality) - [ ] Publish on HuggingFace blog or personal blog ### ๐Ÿ“ฆ Submission Checklist (deadline: June 15, 2026) - [ ] App running and stable on HuggingFace Space - [ ] Demo video (~2 minutes) showing full flow: upload โ†’ question โ†’ answer โ†’ feedback - [ ] Social media post (LinkedIn + Twitter) with Space link and demo - [x] Blog post / Field Notes published (BLOG.md) - [ ] Submission form filled on HuggingFace - [ ] All badge requirements verified