Spaces:

build-small-hackathon
/

PaperProf

Running on Zero

App Files Files Community

Ryadg commited on 26 days ago

Commit

76e80aa

1 Parent(s): bae929c

add task.md

Browse files

Files changed (1) hide show

TASK.md +71 -0

TASK.md ADDED Viewed

	@@ -0,0 +1,71 @@

+# PaperProf — Task List
+Build Small Hackathon — June 5–15, 2026
+## ✅ Done
+- Project structure created (app.py, core/, model/, ui/)
+- core/parser.py: PDF text extraction with PyMuPDF
+- core/chunker.py: paragraph-based text chunking with min/max word caps
+- core/questioner.py: LLM-based question generation with professor prompt
+- core/evaluator.py: LLM-based answer evaluation with tutor prompt
+- model/llm.py: MiniCPM4-8B singleton wrapper via HuggingFace Transformers
+- app.py: full Gradio UI with PDF upload, question generation, answer evaluation
+- requirements.txt: all dependencies listed
+- README.md: project description and file structure
+- .gitignore: venv, models, cache excluded
+- ZeroGPU enabled on HuggingFace Space (free RTX Pro 6000 Blackwell)
+- Space live at: https://huggingface.co/spaces/build-small-hackathon/PaperProf
+## 🔲 To Do
+### 🧪 Testing & Bug Fixes
+- [ ] End-to-end test: upload real PDF → load → generate question → answer → get feedback
+- [ ] Handle edge cases: empty PDF, scanned PDF (no text), very short PDF
+- [ ] Handle model loading errors gracefully in the UI
+- [ ] Test with multiple PDF types (slides, textbook chapters, lecture notes)
+- [ ] Fix any ZeroGPU cold start issues (model takes time to load)
+### 🎨 UI/Design (Badge: Off-Brand)
+- [ ] Replace default Gradio theme with fully custom CSS using gr.Server
+- [ ] Add PaperProf logo and branding
+- [ ] Add progress indicator when model is generating
+- [ ] Add session score tracker (X/Y correct answers)
+- [ ] Add difficulty selector (Easy / Normal / Hard mode)
+- [ ] Add language selector (French / English)
+- [ ] Show source chunk used for the question (collapsible)
+- [ ] Add end-of-session summary screen with score and weak areas
+- [ ] Make the UI mobile-friendly
+### 🧠 ML/Model Improvements
+- [ ] Add quiz modes: Quick (5 questions) / Full session / Brutal mode
+- [ ] Implement adaptive difficulty: revisit failed concepts, skip mastered ones
+- [ ] Add chunk relevance scoring to pick the most important chunks first
+- [ ] Support multiple PDF uploads in one session
+- [ ] Add support for plain text (.txt) and markdown (.md) files
+- [ ] Cache parsed chunks in session state to avoid re-parsing on reload
+### 🏅 Bonus Quests
+- [ ] 🔌 Off the Grid: verify zero external API calls (all local/ZeroGPU)
+- [ ] 🎯 Well-Tuned: fine-tune MiniCPM on educational Q&A data using Modal credits ($250)
+  - [ ] Find/create educational Q&A dataset on HuggingFace
+  - [ ] Write fine-tuning script with LoRA/QLoRA
+  - [ ] Run fine-tuning on Modal GPU
+  - [ ] Publish fine-tuned model on HuggingFace under build-small-hackathon org
+  - [ ] Update model/llm.py to use fine-tuned model
+- [ ] 🎨 Off-Brand: fully custom Gradio frontend (see UI section above)
+- [ ] 🦙 Llama Champion: switch inference to llama.cpp runtime
+  - [ ] Download GGUF version of MiniCPM4-8B
+  - [ ] Replace transformers pipeline with llama-cpp-python
+  - [ ] Test performance vs transformers
+- [ ] 📡 Sharing is Caring: export and share agent trace on HuggingFace Hub
+- [ ] 📓 Field Notes: write blog post about what we built and learned
+  - [ ] Document architecture decisions
+  - [ ] Include benchmark results (speed, quality)
+  - [ ] Publish on HuggingFace blog or personal blog
+### 📦 Submission Checklist (deadline: June 15, 2026)
+- [ ] App running and stable on HuggingFace Space
+- [ ] Demo video (~2 minutes) showing full flow: upload → question → answer → feedback
+- [ ] Social media post (LinkedIn + Twitter) with Space link and demo
+- [ ] Blog post / Field Notes published
+- [ ] Submission form filled on HuggingFace
+- [ ] All badge requirements verified