Spaces:

build-small-hackathon
/

educrate

Sleeping

App Files Files Community

educrate / README.md

fabrizziomcl

Add LLM-judge (MRBench) socratic-quality results

c18c9a9 verified 18 days ago

preview code

Raw

History Blame Contribute Delete

3.51 kB

	---
	title: EduCrate — Socratic Tutor
	emoji: 📘
	colorFrom: gray
	colorTo: blue
	sdk: gradio
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Spanish Socratic tutor on CPU; never gives the answer.
	tags:
	- gradio
	- build-small-hackathon
	- track:backyard
	- badge-tiny-titan
	- tiny-titan
	- achievement:offgrid
	- achievement:welltuned
	- achievement:fieldnotes
	- achievement:offbrand
	- sponsor:modal
	models:
	- build-small-hackathon/educrate-qwen3-sft
	- fabrizziomcl/nanoballena-qwen3-socratic
	---

	# EduCrate — A Socratic Tutor for Peruvian Public-School Students

	EduCrate never gives the final answer. It guides students with one question at a time,
	detects their mistake, and offers progressive hints (the maieutic method) so they
	discover the answer themselves. Spanish-language tutoring focused on mathematical
	reasoning and reading comprehension, small enough to run on CPU.

	## Links
	- Demo video: _TODO: paste link_
	- Social post: _TODO: paste link_
	- Model: https://huggingface.co/build-small-hackathon/educrate-qwen3-bi

	## The problem
	Peru's public secondary schools face a learning crisis. In PISA 2022 (OECD), only 34%
	of Peruvian 15-year-olds reached basic proficiency in math (66% below) and 50% in
	reading. Peru's national assessment (ECE / MINEDU, grade 8, 2022) found only about 12.7%
	Satisfactory in math, with public (state) schools far behind private ones. Most chatbots
	just hand over the answer — which does not build reasoning.

	## The model
	- Base: Qwen/Qwen3-0.6B (596M), fine-tuned with LoRA on **~4,000 bilingual
	(Spanish+English) Socratic dialogues** with brief hidden reasoning, generated for this
	project. LoRA keeps the base competence (no catastrophic forgetting). Runs on CPU.
	- Model: `build-small-hackathon/educrate-qwen3-bi`

	## Evaluation (held-out, rigorous)
	Socratic behavior — answer-withholding on held-out mGSM (greedy, `<think>` stripped):

	\| Model \| ES withhold / asks \| EN withhold / asks \|
	\|---\|---\|---\|
	\| Qwen3-0.6B (instruct) \| 0.84 / 1.00 \| 0.91 / 1.00 \|
	\| EduCrate \| 1.00 / 1.00 \| 1.00 / 1.00 \|

	Underlying capability (no degradation) — accuracy vs the Qwen3-0.6B base & instruct:

	\| Model \| mGSM ES/EN (math) \| BELEBELE ES/EN (reading) \|
	\|---\|---\|---\|
	\| Qwen3-0.6B-Base \| 0.00 / 0.00 \| 0.20 / 0.15 \|
	\| Qwen3-0.6B (instruct) \| 0.44 / 0.51 \| 0.40 / 0.39 \|
	\| EduCrate \| 0.34 / 0.43 \| 0.51 / 0.54 \|

	Reading comprehension improved; math solve-accuracy dips slightly (the model is trained
	to guide, not solve) — and English is retained, confirming LoRA prevented forgetting.

	Socratic quality (LLM-as-judge, MRBench-style rubric, 0–2; judge = Qwen2.5-32B, n=10):

	\| Model \| Overall \| Withholds answer \| Guidance \| Coherence \| Tone \|
	\|---\|---\|---\|---\|---\|---\|
	\| Qwen3-0.6B-Base \| 0.78 \| 1.2 \| 0.6 \| 0.8 \| 0.8 \|
	\| Qwen3-0.6B (instruct) \| 1.27 \| 1.4 \| 1.2 \| 1.8 \| 1.3 \|
	\| EduCrate \| 1.72 \| 2.0 \| 1.9 \| 1.9 \| 1.8 \|

	EduCrate scores highest on every dimension — the fine-tune improves tutoring quality, not
	just answer-withholding.

	## How to use
	Click an example, or: (optional) paste a reading passage, choose what you need, and ask
	your question. The tutor replies in Spanish with a guiding question, never the answer.
	It is a 0.6B model, so guidance is sometimes imperfect.

	> Built for the Build Small Hackathon — track Backyard AI, Tiny Titan badge (≤4B).
	> Made with generative AI; validate any pedagogical use with a teacher.