Instructions to use jtmuller/gemmacademy-fractions-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jtmuller/gemmacademy-fractions-v1 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("jtmuller/gemmacademy-fractions-v1", dtype="auto") - LiteRT-LM
How to use jtmuller/gemmacademy-fractions-v1 with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=jtmuller/gemmacademy-fractions-v1 \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Gemmacademy β Fractions v1
Built for the Google Gemma 4 Hackathon (May 2026).
A fine-tuned Gemma 4 E2B that runs on a student's Android phone offline, grounded in a teacher's specific lesson content.
What this is
Gemmacademy is an end-to-end pipeline that lets a teacher fine-tune Gemma 4 E2B on their week's lesson materials and ship the resulting model to students' phones. Students then use it offline at home β getting tutoring grounded in what their teacher actually taught, with no internet required.
This repo contains the first deployed model from that pipeline: a Gemma 4 E2B fine-tuned to teach 4th-grade fractions in the style of a fictional teacher named "Mrs. Henderson," using a specific procedural method called the "Pizza Method."
The on-device artifact (gemmacademy-fractions-v1-wi8.litertlm) runs via LiteRT-LM on Android. The student app downloads it once over WiFi, then runs it offline forever after.
Why local AI for this use case
Roughly 15-20 million children in the US live in households without reliable broadband. Globally that number is in the hundreds of millions. These kids attend school, sit through lessons, then go home to environments where every cloud-based tutoring tool β every homework helper β simply doesn't work. Beyond connectivity, on-device inference also addresses FERPA constraints around student data leaving school networks, and per-API-call pricing that makes serving low-income students at scale economically infeasible.
A fine-tuned model that fits on a $50 phone has fundamentally different unit economics. That's the bet of this project.
What's in this repo
| File / Directory | Contents | Size |
|---|---|---|
gemmacademy-fractions-v1-wi8.litertlm |
The deployment artifact. Quantized via dynamic_wi8_afp32. This is what runs on student phones. |
4.8 GB |
bf16/ |
Full-precision merged weights (Unsloth merge of LoRA + base into BF16 safetensors). | 9.6 GB |
lora-adapter/ |
The LoRA adapter on its own β apply to base Gemma 4 E2B for a different quantization or platform. | ~240 MB |
qa-fractions.jsonl |
The 500 synthetic training Q&A pairs the model was fine-tuned on. | ~600 KB |
lesson-content/fractions-pizza-method.txt |
The original ~2,000-word lesson description that seeded the synthetic data. | small |
qa_generation_prompt.md |
The system prompt used to generate Q&A from the lesson with Gemma 4 26B. | small |
eval-results-ship.md |
Side-by-side eval of base vs. fine-tuned on 20 questions. | small |
eval-results-compare.md |
Full quantization shootout (BF16 / wi4 / wi8) at rank 32. | small |
eval-results-r128.md |
Quantization shootout (BF16 / wi4 / wi8 / weight_only_wi4) at rank 128. | small |
How to use
On Android via LiteRT-LM
This is the deployment target. Reference LiteRT-LM Android Getting Started. Download gemmacademy-fractions-v1-wi8.litertlm to the device's app-private storage, then load with the LiteRT-LM Kotlin API.
On a desktop via the litert-lm CLI
uv tool install litert-lm
litert-lm run \
--from-huggingface-repo=jtmuller/gemmacademy-fractions-v1 \
gemmacademy-fractions-v1-wi8.litertlm \
--prompt="What is the Henderson Pizza Method?"
Verified working on Apple Silicon MacBooks at usable decode speeds.
As a base for further fine-tuning
Use the bf16/ directory as a transformers-compatible checkpoint, or apply lora-adapter/ on top of google/gemma-4-E2B-it.
Training details
| Field | Value |
|---|---|
| Base model | google/gemma-4-E2B-it (~4.8B params) |
| Method | LoRA via Unsloth |
| LoRA rank | 128 |
| LoRA alpha | 128 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Precision | BF16 |
| Optimizer | AdamW (8-bit) |
| Learning rate | 2e-4, linear schedule |
| Epochs | 3 |
| Batch size | 4 (Γ 2 grad accum = effective 8) |
| Hardware | 1Γ NVIDIA RTX 5090 (32 GB) |
| Training time | ~80 seconds |
| Final train loss | 0.76 |
| Final eval loss | 2.43 (held-out 50 examples) |
The notable train/eval loss gap (~1.7) is documented as an open question in our engineering notes. Our hypothesis is that the synthetic dataset has stylistic regularity the model memorizes faster than it generalizes; behavioral evaluation (see eval-results-*.md) was used as the primary quality signal rather than raw loss.
Data
The 500 training Q&A pairs in qa-fractions.jsonl were synthetically generated. We wrote ~2,000 words of "lesson content" describing a fictional 4th-grade fractions class taught by a fictional teacher (Mrs. Henderson) using a fictional procedural method (the "Pizza Method"), then generated diverse Q&A pairs from that content using Gemma 4 26B (AWQ-4bit) served via vLLM as the data generator. The full prompt is in qa_generation_prompt.md.
We use a fictional classroom rather than a real one because (a) we don't have access to a real classroom's materials at this stage, and (b) the fictional method gives us a clean evaluation signal: the base Gemma 4 E2B has never heard of "the Pizza Method," so any positive answer about it is provably learned from fine-tuning rather than recalled from pretraining.
In real deployment, Gemmacademy will produce models from actual teachers' actual lesson materials. The fictional setup is a controlled stand-in for evaluation.
Evaluation
We evaluated 20 questions across three categories:
- 10 classroom-specific β about the Pizza Method, Mrs. Henderson's specific catchphrases, etc. Only the fine-tune should know these.
- 5 general 4th-grade fractions β both base and fine-tune should handle these.
- 5 off-topic β capital of France, World War 2, etc. Both should respond reasonably; we want the fine-tune to not lose general capability.
Detailed results in eval-results-ship.md.
Headline finding: the deployed wi8 model captures the lesson essence on roughly 6 of 10 classroom-specific questions (e.g., produces "equal slices, equal fractions" or close paraphrases, draws procedurally correct fraction diagrams), while the base model uniformly responds with "I don't know what that is" or asks for context. On general fractions and off-topic questions, the fine-tune is on par with base. The fine-tune is a clear improvement on the target task without obvious capability regression elsewhere.
A finding worth flagging: int4 doesn't survive small-magnitude LoRA fine-tunes
The hackathon's "ship to the cheapest possible phone" framing pushed us hard toward int4 quantization (dynamic_wi4_afp32, ~2.4 GB). It didn't work for our fine-tune.
We tested four configurations with LoRA rank 128 β see eval-results-r128.md β and found:
dynamic_wi4_afp32(4-bit dynamic, the official Gemma 4 E2B recipe): The fine-tune signal is largely lost. The model produces pizza-fractions content but doesn't reproduce the specific catchphrases or follow the procedural rules from training. Base Gemma 4 E2B in this same recipe also degrades on world knowledge ("Australia won the 2022 World Cup" β fabricated).weight_only_wi4_afp32(4-bit weight-only, alternative algorithm): Catastrophically broken β produces degenerate token loops on every prompt.dynamic_wi8_afp32(8-bit, doubles file size to 4.8 GB): Captures the lesson essence cleanly, restores base capability on general questions. This is what we shipped.- BF16 (no quantization, 9.6 GB): Reproduces near-verbatim training-set phrasing. Confirms the LoRA learned what we wanted at training time.
Diagnosis: rank-128 LoRA produces weight deltas that are large enough to land in different bins after 8-bit quantization (256 bins per channel) but get rounded back to base values after 4-bit quantization (16 bins per channel). The base Gemma 4 E2B survives int4 because Google trained it with quantization-aware training β that benefit doesn't transfer to a post-hoc LoRA on top.
Implication for others: If you're fine-tuning Gemma 4 with LoRA and need int4 deployment, you'll likely need either (a) much higher LoRA ranks than 128, (b) quantization-aware fine-tuning, or (c) full fine-tuning. Standard LoRA ranks of 8-32 will almost certainly produce a model that looks fine in BF16 and breaks at int4. We landed on shipping the wi8 artifact (4.8 GB) as the right quality/size trade-off for our use case.
Limitations
- Single subject, single grade. This v1 model only knows 4th-grade fractions content with the Pizza Method. Out-of-distribution questions get reasonable but generic responses.
- Fictional content. Mrs. Henderson does not exist. The Pizza Method is invented. Real teachers using the pipeline get models trained on their real materials; this artifact is a demonstration.
- Synthetic data limitations. The training data is generator-quality, not human-curated. There will be subtle inaccuracies and stylistic regularities the model picks up.
- Eval set size. 50 held-out examples is small for reliable loss measurement. Behavioral eval is the more trustworthy signal at this scale.
- No safety tuning beyond Gemma 4 base. The base model's safety properties pass through; we have not added or evaluated additional safety guarantees.
- English only. Multilingual support is on the v2 roadmap.
License
This model is a derivative of google/gemma-4-E2B-it and is governed by the Gemma Terms of Use.
Citation
@misc{gemmacademy-fractions-v1,
title = {Gemmacademy: A Fine-Tuned On-Device Tutor for 4th Grade Fractions},
author = {Muller, Joseph},
year = {2026},
note = {Built for the Google Gemma 4 Hackathon},
url = {https://huggingface.co/jtmuller/gemmacademy-fractions-v1}
}
Project links
- Hackathon writeup: (link to Kaggle writeup once published)
- Demo video: (link to YouTube demo once published)
- Source code: (link to GitHub once published)