nano-proofread

Fixes the writing errors a spell-checker can't see โ€” their going to win โ†’ they're going to win, its raining again โ†’ it's raining again, the the cat sat โ†’ the cat sat. The mistakes are real words (their/there/they're are all spelled correctly), so a spell-checker stays silent; which one is right depends on the surrounding words. A ~1M-parameter (1,016,960) byte-level transformer that reads the context and picks.

Scope (a fixed confusion set, not general grammar): their/there/they're, your/you're, its/it's, then/than, to/too, could have/could of, and doubled words.

Benchmark

model best context-free script
overall (held-out, N=4000) 100.0% 49.2%
context slice (N=2030) 100.0% 0.0%
out-of-distribution (N=25) 92.0% 36.0%

The script is 0% on the context slice by construction โ€” it can only emit its default member, which is wrong exactly where context decides. The number that matters is the last row: on 25 natural phrases matching no training template, the model beats the script by 56 points โ€” it learned the grammatical cue, not memorised sentences. (An earlier 14-template version scored 99% on a same-template split but failed on real phrases; the frame-based generator + this OOD test is what keeps the result honest.)

Usage

pip install torch safetensors numpy
# grab modeling_nano_proofread.py + config.json from the GitHub repo
from modeling_nano_proofread import load, proofread
m = load("model.safetensors", "config.json")
proofread(m, "their going to win")   # -> "they're going to win"
proofread(m, "its raining again")    # -> "it's raining again"

How it was trained

100% code-generated, correct by construction: build a correct phrase from ~65 grammatical frames with rich fillers, then inject one error (swap the confusion word, or double a word); ~15% identity. SFT, prompt masked. ~1M-param byte-level transformer (RMSNorm, RoPE, GQA, SwiGLU), 24k steps, AdamW, cosine LR. Full recipe and reproduction in the GitHub repo.

MIT. Built by Vuk Rosiฤ‡.

Downloads last month
2
Safetensors
Model size
1.02M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support