Spaces:
Running on Zero
Running on Zero
| # PaperProf β Task List | |
| Build Small Hackathon β June 5β15, 2026 | |
| ## β Done | |
| - Project structure created (app.py, core/, model/, ui/) | |
| - core/parser.py: PDF text extraction with PyMuPDF | |
| - core/chunker.py: paragraph-based text chunking with min/max word caps | |
| - core/questioner.py: LLM-based question generation with professor prompt | |
| - core/evaluator.py: LLM-based answer evaluation with tutor prompt | |
| - model/llm.py: MiniCPM4-8B singleton wrapper via HuggingFace Transformers (bfloat16, chat template) | |
| - app.py: full Gradio UI with PDF upload, question generation, answer evaluation | |
| - requirements.txt: all dependencies listed | |
| - README.md: project description and file structure | |
| - .gitignore: venv, models, cache excluded | |
| - ZeroGPU enabled on HuggingFace Space (free RTX Pro 6000 Blackwell) | |
| - Space live at: https://huggingface.co/spaces/build-small-hackathon/PaperProf | |
| ## π² To Do | |
| ### π§ͺ Testing & Bug Fixes | |
| - [ ] End-to-end test: upload real PDF β load β generate question β answer β get feedback | |
| - [x] Handle edge cases: empty PDF, scanned PDF (no text), very short PDF | |
| - [x] Handle model loading errors gracefully in the UI | |
| - [ ] Test with multiple PDF types (slides, textbook chapters, lecture notes) | |
| - [ ] Fix any ZeroGPU cold start issues (model takes time to load) | |
| ### π¨ UI/Design (Badge: Off-Brand) | |
| - [ ] Replace default Gradio theme with fully custom CSS using gr.Server | |
| - [ ] Add PaperProf logo and branding | |
| - [ ] Add progress indicator when model is generating | |
| - [x] Add session score tracker (X/Y correct answers) | |
| - [ ] Add difficulty selector (Easy / Normal / Hard mode) | |
| - [ ] Add language selector (French / English) | |
| - [ ] Show source chunk used for the question (collapsible) | |
| - [ ] Add end-of-session summary screen with score and weak areas | |
| - [ ] Make the UI mobile-friendly | |
| ### π§ ML/Model Improvements | |
| - [ ] Add quiz modes: Quick (5 questions) / Full session / Brutal mode | |
| - [ ] Implement adaptive difficulty: revisit failed concepts, skip mastered ones | |
| - [ ] Add chunk relevance scoring to pick the most important chunks first | |
| - [ ] Support multiple PDF uploads in one session | |
| - [ ] Add support for plain text (.txt) and markdown (.md) files | |
| - [ ] Cache parsed chunks in session state to avoid re-parsing on reload | |
| ### π Bonus Quests | |
| - [ ] π Off the Grid: verify zero external API calls (all local/ZeroGPU) | |
| - [ ] π― Well-Tuned: fine-tune MiniCPM on educational Q&A data using Modal credits ($250) | |
| - [ ] Find/create educational Q&A dataset on HuggingFace | |
| - [ ] Write fine-tuning script with LoRA/QLoRA | |
| - [ ] Run fine-tuning on Modal GPU | |
| - [ ] Publish fine-tuned model on HuggingFace under build-small-hackathon org | |
| - [ ] Update model/llm.py to use fine-tuned model | |
| - [ ] π¨ Off-Brand: fully custom Gradio frontend (see UI section above) | |
| - [ ] π¦ Llama Champion: switch inference to llama.cpp runtime | |
| - [ ] Download GGUF version of MiniCPM4-8B | |
| - [ ] Replace transformers pipeline with llama-cpp-python | |
| - [ ] Test performance vs transformers | |
| - [ ] π‘ Sharing is Caring: export and share agent trace on HuggingFace Hub | |
| - [x] π Field Notes: write blog post about what we built and learned (BLOG.md) | |
| - [ ] Document architecture decisions | |
| - [ ] Include benchmark results (speed, quality) | |
| - [ ] Publish on HuggingFace blog or personal blog | |
| ### π¦ Submission Checklist (deadline: June 15, 2026) | |
| - [ ] App running and stable on HuggingFace Space | |
| - [ ] Demo video (~2 minutes) showing full flow: upload β question β answer β feedback | |
| - [ ] Social media post (LinkedIn + Twitter) with Space link and demo | |
| - [x] Blog post / Field Notes published (BLOG.md) | |
| - [ ] Submission form filled on HuggingFace | |
| - [ ] All badge requirements verified | |