educrate / README.md
fabrizziomcl's picture
Add LLM-judge (MRBench) socratic-quality results
c18c9a9 verified
|
Raw
History Blame Contribute Delete
3.51 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: EduCrate  Socratic Tutor
emoji: 📘
colorFrom: gray
colorTo: blue
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: Spanish Socratic tutor on CPU; never gives the answer.
tags:
  - gradio
  - build-small-hackathon
  - track:backyard
  - badge-tiny-titan
  - tiny-titan
  - achievement:offgrid
  - achievement:welltuned
  - achievement:fieldnotes
  - achievement:offbrand
  - sponsor:modal
models:
  - build-small-hackathon/educrate-qwen3-sft
  - fabrizziomcl/nanoballena-qwen3-socratic

EduCrate — A Socratic Tutor for Peruvian Public-School Students

EduCrate never gives the final answer. It guides students with one question at a time, detects their mistake, and offers progressive hints (the maieutic method) so they discover the answer themselves. Spanish-language tutoring focused on mathematical reasoning and reading comprehension, small enough to run on CPU.

Links

The problem

Peru's public secondary schools face a learning crisis. In PISA 2022 (OECD), only 34% of Peruvian 15-year-olds reached basic proficiency in math (66% below) and 50% in reading. Peru's national assessment (ECE / MINEDU, grade 8, 2022) found only about 12.7% Satisfactory in math, with public (state) schools far behind private ones. Most chatbots just hand over the answer — which does not build reasoning.

The model

  • Base: Qwen/Qwen3-0.6B (596M), fine-tuned with LoRA on ~4,000 bilingual (Spanish+English) Socratic dialogues with brief hidden reasoning, generated for this project. LoRA keeps the base competence (no catastrophic forgetting). Runs on CPU.
  • Model: build-small-hackathon/educrate-qwen3-bi

Evaluation (held-out, rigorous)

Socratic behavior — answer-withholding on held-out mGSM (greedy, <think> stripped):

Model ES withhold / asks EN withhold / asks
Qwen3-0.6B (instruct) 0.84 / 1.00 0.91 / 1.00
EduCrate 1.00 / 1.00 1.00 / 1.00

Underlying capability (no degradation) — accuracy vs the Qwen3-0.6B base & instruct:

Model mGSM ES/EN (math) BELEBELE ES/EN (reading)
Qwen3-0.6B-Base 0.00 / 0.00 0.20 / 0.15
Qwen3-0.6B (instruct) 0.44 / 0.51 0.40 / 0.39
EduCrate 0.34 / 0.43 0.51 / 0.54

Reading comprehension improved; math solve-accuracy dips slightly (the model is trained to guide, not solve) — and English is retained, confirming LoRA prevented forgetting.

Socratic quality (LLM-as-judge, MRBench-style rubric, 0–2; judge = Qwen2.5-32B, n=10):

Model Overall Withholds answer Guidance Coherence Tone
Qwen3-0.6B-Base 0.78 1.2 0.6 0.8 0.8
Qwen3-0.6B (instruct) 1.27 1.4 1.2 1.8 1.3
EduCrate 1.72 2.0 1.9 1.9 1.8

EduCrate scores highest on every dimension — the fine-tune improves tutoring quality, not just answer-withholding.

How to use

Click an example, or: (optional) paste a reading passage, choose what you need, and ask your question. The tutor replies in Spanish with a guiding question, never the answer. It is a 0.6B model, so guidance is sometimes imperfect.

Built for the Build Small Hackathon — track Backyard AI, Tiny Titan badge (≤4B). Made with generative AI; validate any pedagogical use with a teacher.