Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
PaperProf β Task List
Build Small Hackathon β June 5β15, 2026
β Done
- Project structure created (app.py, core/, model/, ui/)
- core/parser.py: PDF text extraction with PyMuPDF
- core/chunker.py: paragraph-based text chunking with min/max word caps
- core/questioner.py: LLM-based question generation with professor prompt
- core/evaluator.py: LLM-based answer evaluation with tutor prompt
- model/llm.py: MiniCPM4-8B singleton wrapper via HuggingFace Transformers (bfloat16, chat template)
- app.py: full Gradio UI with PDF upload, question generation, answer evaluation
- requirements.txt: all dependencies listed
- README.md: project description and file structure
- .gitignore: venv, models, cache excluded
- ZeroGPU enabled on HuggingFace Space (free RTX Pro 6000 Blackwell)
- Space live at: https://huggingface.co/spaces/build-small-hackathon/PaperProf
π² To Do
π§ͺ Testing & Bug Fixes
- End-to-end test: upload real PDF β load β generate question β answer β get feedback
- Handle edge cases: empty PDF, scanned PDF (no text), very short PDF
- Handle model loading errors gracefully in the UI
- Test with multiple PDF types (slides, textbook chapters, lecture notes)
- Fix any ZeroGPU cold start issues (model takes time to load)
π¨ UI/Design (Badge: Off-Brand)
- Replace default Gradio theme with fully custom CSS using gr.Server
- Add PaperProf logo and branding
- Add progress indicator when model is generating
- Add session score tracker (X/Y correct answers)
- Add difficulty selector (Easy / Normal / Hard mode)
- Add language selector (French / English)
- Show source chunk used for the question (collapsible)
- Add end-of-session summary screen with score and weak areas
- Make the UI mobile-friendly
π§ ML/Model Improvements
- Add quiz modes: Quick (5 questions) / Full session / Brutal mode
- Implement adaptive difficulty: revisit failed concepts, skip mastered ones
- Add chunk relevance scoring to pick the most important chunks first
- Support multiple PDF uploads in one session
- Add support for plain text (.txt) and markdown (.md) files
- Cache parsed chunks in session state to avoid re-parsing on reload
π Bonus Quests
- π Off the Grid: verify zero external API calls (all local/ZeroGPU)
- π― Well-Tuned: fine-tune MiniCPM on educational Q&A data using Modal credits ($250)
- Find/create educational Q&A dataset on HuggingFace
- Write fine-tuning script with LoRA/QLoRA
- Run fine-tuning on Modal GPU
- Publish fine-tuned model on HuggingFace under build-small-hackathon org
- Update model/llm.py to use fine-tuned model
- π¨ Off-Brand: fully custom Gradio frontend (see UI section above)
- π¦ Llama Champion: switch inference to llama.cpp runtime
- Download GGUF version of MiniCPM4-8B
- Replace transformers pipeline with llama-cpp-python
- Test performance vs transformers
- π‘ Sharing is Caring: export and share agent trace on HuggingFace Hub
- π Field Notes: write blog post about what we built and learned (BLOG.md)
- Document architecture decisions
- Include benchmark results (speed, quality)
- Publish on HuggingFace blog or personal blog
π¦ Submission Checklist (deadline: June 15, 2026)
- App running and stable on HuggingFace Space
- Demo video (~2 minutes) showing full flow: upload β question β answer β feedback
- Social media post (LinkedIn + Twitter) with Space link and demo
- Blog post / Field Notes published (BLOG.md)
- Submission form filled on HuggingFace
- All badge requirements verified