Spaces:

johneze
/

ktt-math-tutor

Runtime error

App Files Files Community

ktt-math-tutor / process_log.md

johneze

feat: KTT Edge-AI Child Math Tutor — AIMS Hackathon S2.T3.1

1433553 verified about 1 month ago

preview code

raw

history blame contribute delete

3.3 kB

	# process_log.md — S2.T3.1 · AI Math Tutor for Early Learners

	Candidate: John Eze
	Challenge: S2.T3.1 — Tier 3
	Date: 2026-04-24
	Hard cap: 4 hours

	---

	## Hour-by-Hour Timeline

	\| Time \| Activity \|
	\|------\|----------\|
	\| 00:00–00:45 \| Read full brief end-to-end. Examined seed files (curriculum_seed.json, diagnostic_probes_seed.csv, parent_report_schema.json, child_utt_sample_seed.csv). Scaffolded repository structure: tutor/ package, data/, assets/, tts/, requirements.txt, README.md, SIGNED.md. \|
	\| 00:45–01:30 \| Built `generate_curriculum.py` (60+ item generator). Built `tutor/curriculum_loader.py`. \|
	\| 01:30–02:15 \| Implemented `tutor/adaptive.py` — BKT per sub-skill with update rules + Elo baseline. Built `kt_eval.ipynb` skeleton with held-out replay simulation. \|
	\| 02:15–02:45 \| Built `tutor/visual.py` (PIL counting images), `tutor/feedback.py` (EN/FR/KIN feedback table), `tutor/asr_adapt.py` (Whisper-tiny stub + langdetect). \|
	\| 02:45–03:15 \| Built `tutor/db.py` (encrypted SQLite store) and `tutor/dp_sync.py` (ε-DP aggregation design). \|
	\| 03:15–03:45 \| Built `demo.py` (Gradio child-facing UI: learner select, tutor loop, parent view). \|
	\| 03:45–04:00 \| Built `parent_report.py`. Completed `footprint_report.md`, `README.md`, `process_log.md`. Final push. \|

	---

	## LLM / Tool Use Declaration

	\| Tool \| Why used \|
	\|------\|----------\|
	\| GitHub Copilot (Claude Sonnet 4.6) \| Code scaffolding, boilerplate generation, docstrings, README drafting. All logic reviewed, debugged, and adapted by me. \|

	### Three sample prompts I actually sent

	1. "Build the BKT update rule for 5 sub-skills — P_know, P_learn, P_guess, P_slip — with a mastery threshold of 0.85 and return the next best item."
	2. "Write a PIL function that renders N emoji-like circles on a white canvas and saves to assets/, returning the image path and the correct count."
	3. "Generate a Gradio demo with three tabs: learner avatar select, tutor question-answer loop with image display and mic/text input, and a password-protected parent report view."

	### One prompt I discarded and why

	I drafted a prompt asking Copilot to "fully fine-tune TinyLlama with QLoRA for numeracy feedback generation". I discarded this because (a) 4-hour time cap, (b) TinyLlama int4 GGUF is ~650 MB which blows the footprint budget if bundled, and (c) rule-based + template feedback is faster, more predictable, and fully explainable for a child-safety product. LLM enhancement can be added post-hackathon.

	---

	## Single Hardest Decision

	The hardest decision was choosing between a Deep Knowledge Tracing (DKT) GRU model and Bayesian Knowledge Tracing (BKT). DKT would likely score higher AUC on complex interaction sequences but requires PyTorch, adds ~50 MB to the footprint, and takes 30+ min to train even on CPU. BKT is interpretable, has zero training cost, runs in microseconds per update, and is well-understood in educational research. I chose BKT as the primary model and kept the Elo baseline for comparison in `kt_eval.ipynb` — this satisfies the brief's AUC comparison requirement while staying within the 75 MB footprint and 4-hour time budget.

	---

	Log updated continuously throughout the session.

	# process_log.md — S2.T3.1 · AI Math Tutor for Early Learners

	Candidate: John Eze
	Challenge: S2.T3.1 — Tier 3
	Date: 2026-04-24
	Hard cap: 4 hours

	---

	## Hour-by-Hour Timeline

	\| Time \| Activity \|
	\|------\|----------\|
	\| 00:00–00:45 \| Read full brief end-to-end. Examined seed files (curriculum_seed.json, diagnostic_probes_seed.csv, parent_report_schema.json, child_utt_sample_seed.csv). Scaffolded repository structure: tutor/ package, data/, assets/, tts/, requirements.txt, README.md, SIGNED.md. \|
	\| 00:45–01:30 \| Built `generate_curriculum.py` (60+ item generator). Built `tutor/curriculum_loader.py`. \|
	\| 01:30–02:15 \| Implemented `tutor/adaptive.py` — BKT per sub-skill with update rules + Elo baseline. Built `kt_eval.ipynb` skeleton with held-out replay simulation. \|
	\| 02:15–02:45 \| Built `tutor/visual.py` (PIL counting images), `tutor/feedback.py` (EN/FR/KIN feedback table), `tutor/asr_adapt.py` (Whisper-tiny stub + langdetect). \|
	\| 02:45–03:15 \| Built `tutor/db.py` (encrypted SQLite store) and `tutor/dp_sync.py` (ε-DP aggregation design). \|
	\| 03:15–03:45 \| Built `demo.py` (Gradio child-facing UI: learner select, tutor loop, parent view). \|
	\| 03:45–04:00 \| Built `parent_report.py`. Completed `footprint_report.md`, `README.md`, `process_log.md`. Final push. \|

	---

	## LLM / Tool Use Declaration

	\| Tool \| Why used \|
	\|------\|----------\|
	\| GitHub Copilot (Claude Sonnet 4.6) \| Code scaffolding, boilerplate generation, docstrings, README drafting. All logic reviewed, debugged, and adapted by me. \|

	### Three sample prompts I actually sent

	1. "Build the BKT update rule for 5 sub-skills — P_know, P_learn, P_guess, P_slip — with a mastery threshold of 0.85 and return the next best item."
	2. "Write a PIL function that renders N emoji-like circles on a white canvas and saves to assets/, returning the image path and the correct count."
	3. "Generate a Gradio demo with three tabs: learner avatar select, tutor question-answer loop with image display and mic/text input, and a password-protected parent report view."

	### One prompt I discarded and why

	I drafted a prompt asking Copilot to "fully fine-tune TinyLlama with QLoRA for numeracy feedback generation". I discarded this because (a) 4-hour time cap, (b) TinyLlama int4 GGUF is ~650 MB which blows the footprint budget if bundled, and (c) rule-based + template feedback is faster, more predictable, and fully explainable for a child-safety product. LLM enhancement can be added post-hackathon.

	---

	## Single Hardest Decision

	The hardest decision was choosing between a Deep Knowledge Tracing (DKT) GRU model and Bayesian Knowledge Tracing (BKT). DKT would likely score higher AUC on complex interaction sequences but requires PyTorch, adds ~50 MB to the footprint, and takes 30+ min to train even on CPU. BKT is interpretable, has zero training cost, runs in microseconds per update, and is well-understood in educational research. I chose BKT as the primary model and kept the Elo baseline for comparison in `kt_eval.ipynb` — this satisfies the brief's AUC comparison requirement while staying within the 75 MB footprint and 4-hour time budget.

	---

	Log updated continuously throughout the session.