Spaces:

build-small-hackathon
/

educrate

Sleeping

App Files Files Community

educrate / README.md

fabrizziomcl

Add LLM-judge (MRBench) socratic-quality results

c18c9a9 verified 17 days ago

preview code

Raw

History Blame Contribute Delete

3.51 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: EduCrate — Socratic Tutor
emoji: 📘
colorFrom: gray
colorTo: blue
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
short_description: Spanish Socratic tutor on CPU; never gives the answer.
tags:
  - gradio
  - build-small-hackathon
  - track:backyard
  - badge-tiny-titan
  - tiny-titan
  - achievement:offgrid
  - achievement:welltuned
  - achievement:fieldnotes
  - achievement:offbrand
  - sponsor:modal
models:
  - build-small-hackathon/educrate-qwen3-sft
  - fabrizziomcl/nanoballena-qwen3-socratic

EduCrate — A Socratic Tutor for Peruvian Public-School Students

EduCrate never gives the final answer. It guides students with one question at a time, detects their mistake, and offers progressive hints (the maieutic method) so they discover the answer themselves. Spanish-language tutoring focused on mathematical reasoning and reading comprehension, small enough to run on CPU.

The problem

Peru's public secondary schools face a learning crisis. In PISA 2022 (OECD), only 34% of Peruvian 15-year-olds reached basic proficiency in math (66% below) and 50% in reading. Peru's national assessment (ECE / MINEDU, grade 8, 2022) found only about 12.7% Satisfactory in math, with public (state) schools far behind private ones. Most chatbots just hand over the answer — which does not build reasoning.

The model

Base: Qwen/Qwen3-0.6B (596M), fine-tuned with LoRA on ~4,000 bilingual (Spanish+English) Socratic dialogues with brief hidden reasoning, generated for this project. LoRA keeps the base competence (no catastrophic forgetting). Runs on CPU.
Model: build-small-hackathon/educrate-qwen3-bi

Evaluation (held-out, rigorous)

Socratic behavior — answer-withholding on held-out mGSM (greedy, <think> stripped):

Model	ES withhold / asks	EN withhold / asks
Qwen3-0.6B (instruct)	0.84 / 1.00	0.91 / 1.00
EduCrate	1.00 / 1.00	1.00 / 1.00

Underlying capability (no degradation) — accuracy vs the Qwen3-0.6B base & instruct:

Model	mGSM ES/EN (math)	BELEBELE ES/EN (reading)
Qwen3-0.6B-Base	0.00 / 0.00	0.20 / 0.15
Qwen3-0.6B (instruct)	0.44 / 0.51	0.40 / 0.39
EduCrate	0.34 / 0.43	0.51 / 0.54

Reading comprehension improved; math solve-accuracy dips slightly (the model is trained to guide, not solve) — and English is retained, confirming LoRA prevented forgetting.

Socratic quality (LLM-as-judge, MRBench-style rubric, 0–2; judge = Qwen2.5-32B, n=10):

Model	Overall	Withholds answer	Guidance	Coherence	Tone
Qwen3-0.6B-Base	0.78	1.2	0.6	0.8	0.8
Qwen3-0.6B (instruct)	1.27	1.4	1.2	1.8	1.3
EduCrate	1.72	2.0	1.9	1.9	1.8

EduCrate scores highest on every dimension — the fine-tune improves tutoring quality, not just answer-withholding.

How to use

Click an example, or: (optional) paste a reading passage, choose what you need, and ask your question. The tutor replies in Spanish with a guiding question, never the answer. It is a 0.6B model, so guidance is sometimes imperfect.

Built for the Build Small Hackathon — track Backyard AI, Tiny Titan badge (≤4B). Made with generative AI; validate any pedagogical use with a teacher.

EduCrate — A Socratic Tutor for Peruvian Public-School Students

Links

The problem

The model

Evaluation (held-out, rigorous)

How to use