PaperProf / TASK.md
Mehdi
docs: Field Notes blog post β€” what we built and what we learned
dc3b7d4
|
Raw
History Blame Contribute Delete
3.72 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

PaperProf β€” Task List

Build Small Hackathon β€” June 5–15, 2026

βœ… Done

  • Project structure created (app.py, core/, model/, ui/)
  • core/parser.py: PDF text extraction with PyMuPDF
  • core/chunker.py: paragraph-based text chunking with min/max word caps
  • core/questioner.py: LLM-based question generation with professor prompt
  • core/evaluator.py: LLM-based answer evaluation with tutor prompt
  • model/llm.py: MiniCPM4-8B singleton wrapper via HuggingFace Transformers (bfloat16, chat template)
  • app.py: full Gradio UI with PDF upload, question generation, answer evaluation
  • requirements.txt: all dependencies listed
  • README.md: project description and file structure
  • .gitignore: venv, models, cache excluded
  • ZeroGPU enabled on HuggingFace Space (free RTX Pro 6000 Blackwell)
  • Space live at: https://huggingface.co/spaces/build-small-hackathon/PaperProf

πŸ”² To Do

πŸ§ͺ Testing & Bug Fixes

  • End-to-end test: upload real PDF β†’ load β†’ generate question β†’ answer β†’ get feedback
  • Handle edge cases: empty PDF, scanned PDF (no text), very short PDF
  • Handle model loading errors gracefully in the UI
  • Test with multiple PDF types (slides, textbook chapters, lecture notes)
  • Fix any ZeroGPU cold start issues (model takes time to load)

🎨 UI/Design (Badge: Off-Brand)

  • Replace default Gradio theme with fully custom CSS using gr.Server
  • Add PaperProf logo and branding
  • Add progress indicator when model is generating
  • Add session score tracker (X/Y correct answers)
  • Add difficulty selector (Easy / Normal / Hard mode)
  • Add language selector (French / English)
  • Show source chunk used for the question (collapsible)
  • Add end-of-session summary screen with score and weak areas
  • Make the UI mobile-friendly

🧠 ML/Model Improvements

  • Add quiz modes: Quick (5 questions) / Full session / Brutal mode
  • Implement adaptive difficulty: revisit failed concepts, skip mastered ones
  • Add chunk relevance scoring to pick the most important chunks first
  • Support multiple PDF uploads in one session
  • Add support for plain text (.txt) and markdown (.md) files
  • Cache parsed chunks in session state to avoid re-parsing on reload

πŸ… Bonus Quests

  • πŸ”Œ Off the Grid: verify zero external API calls (all local/ZeroGPU)
  • 🎯 Well-Tuned: fine-tune MiniCPM on educational Q&A data using Modal credits ($250)
    • Find/create educational Q&A dataset on HuggingFace
    • Write fine-tuning script with LoRA/QLoRA
    • Run fine-tuning on Modal GPU
    • Publish fine-tuned model on HuggingFace under build-small-hackathon org
    • Update model/llm.py to use fine-tuned model
  • 🎨 Off-Brand: fully custom Gradio frontend (see UI section above)
  • πŸ¦™ Llama Champion: switch inference to llama.cpp runtime
    • Download GGUF version of MiniCPM4-8B
    • Replace transformers pipeline with llama-cpp-python
    • Test performance vs transformers
  • πŸ“‘ Sharing is Caring: export and share agent trace on HuggingFace Hub
  • πŸ““ Field Notes: write blog post about what we built and learned (BLOG.md)
    • Document architecture decisions
    • Include benchmark results (speed, quality)
    • Publish on HuggingFace blog or personal blog

πŸ“¦ Submission Checklist (deadline: June 15, 2026)

  • App running and stable on HuggingFace Space
  • Demo video (~2 minutes) showing full flow: upload β†’ question β†’ answer β†’ feedback
  • Social media post (LinkedIn + Twitter) with Space link and demo
  • Blog post / Field Notes published (BLOG.md)
  • Submission form filled on HuggingFace
  • All badge requirements verified