educrate / README.md
fabrizziomcl's picture
Add LLM-judge (MRBench) socratic-quality results
c18c9a9 verified
|
Raw
History Blame Contribute Delete
3.51 kB
---
title: EduCrate Socratic Tutor
emoji: 📘
colorFrom: gray
colorTo: blue
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: Spanish Socratic tutor on CPU; never gives the answer.
tags:
- gradio
- build-small-hackathon
- track:backyard
- badge-tiny-titan
- tiny-titan
- achievement:offgrid
- achievement:welltuned
- achievement:fieldnotes
- achievement:offbrand
- sponsor:modal
models:
- build-small-hackathon/educrate-qwen3-sft
- fabrizziomcl/nanoballena-qwen3-socratic
---
# EduCrate — A Socratic Tutor for Peruvian Public-School Students
EduCrate never gives the final answer. It guides students with one question at a time,
detects their mistake, and offers progressive hints (the maieutic method) so they
discover the answer themselves. Spanish-language tutoring focused on mathematical
reasoning and reading comprehension, small enough to run on CPU.
## Links
- Demo video: _TODO: paste link_
- Social post: _TODO: paste link_
- Model: https://huggingface.co/build-small-hackathon/educrate-qwen3-bi
## The problem
Peru's public secondary schools face a learning crisis. In PISA 2022 (OECD), only 34%
of Peruvian 15-year-olds reached basic proficiency in math (66% below) and 50% in
reading. Peru's national assessment (ECE / MINEDU, grade 8, 2022) found only about 12.7%
Satisfactory in math, with public (state) schools far behind private ones. Most chatbots
just hand over the answer — which does not build reasoning.
## The model
- Base: **Qwen/Qwen3-0.6B** (596M), fine-tuned with **LoRA** on **~4,000 bilingual
(Spanish+English) Socratic dialogues** with brief hidden reasoning, generated for this
project. LoRA keeps the base competence (no catastrophic forgetting). Runs on **CPU**.
- Model: `build-small-hackathon/educrate-qwen3-bi`
## Evaluation (held-out, rigorous)
**Socratic behavior** — answer-withholding on held-out mGSM (greedy, `<think>` stripped):
| Model | ES withhold / asks | EN withhold / asks |
|---|---|---|
| Qwen3-0.6B (instruct) | 0.84 / 1.00 | 0.91 / 1.00 |
| **EduCrate** | **1.00 / 1.00** | **1.00 / 1.00** |
**Underlying capability (no degradation)** — accuracy vs the Qwen3-0.6B base & instruct:
| Model | mGSM ES/EN (math) | BELEBELE ES/EN (reading) |
|---|---|---|
| Qwen3-0.6B-Base | 0.00 / 0.00 | 0.20 / 0.15 |
| Qwen3-0.6B (instruct) | 0.44 / 0.51 | 0.40 / 0.39 |
| **EduCrate** | 0.34 / 0.43 | **0.51 / 0.54** |
Reading comprehension *improved*; math solve-accuracy dips slightly (the model is trained
to *guide*, not solve) — and English is retained, confirming LoRA prevented forgetting.
**Socratic quality (LLM-as-judge, MRBench-style rubric, 0–2; judge = Qwen2.5-32B, n=10):**
| Model | Overall | Withholds answer | Guidance | Coherence | Tone |
|---|---|---|---|---|---|
| Qwen3-0.6B-Base | 0.78 | 1.2 | 0.6 | 0.8 | 0.8 |
| Qwen3-0.6B (instruct) | 1.27 | 1.4 | 1.2 | 1.8 | 1.3 |
| **EduCrate** | **1.72** | **2.0** | **1.9** | **1.9** | **1.8** |
EduCrate scores highest on every dimension — the fine-tune improves tutoring quality, not
just answer-withholding.
## How to use
Click an example, or: (optional) paste a reading passage, choose what you need, and ask
your question. The tutor replies in Spanish with a guiding question, never the answer.
It is a 0.6B model, so guidance is sometimes imperfect.
> Built for the Build Small Hackathon — track Backyard AI, Tiny Titan badge (≤4B).
> Made with generative AI; validate any pedagogical use with a teacher.