nano-spell

Corrects a misspelled word to its intended word โ€” recieve โ†’ receive, teh โ†’ the, freind โ†’ friend, thier โ†’ their. Which real word a typo most likely meant lives only in a frequency-weighted lexicon, so the obvious script (nearest dictionary word by edit distance) guesses wrong whenever the typo lands as close to another word. A ~1M-parameter (1,016,960) byte-level transformer, trained 100% on code-generated data.

Benchmark (held-out, seed 987654321, N=4000)

model identity nearest-naive nearest-freq
overall 86.8% 16.8% 68.0% 73.7%
hard slice (naive wrong, N=1280) 68.7% โ€” 0.0% 29.8%

Unlike pure lexicon-recovery tasks (where a frequency dictionary ties the model), nano-spell beats both scripts, including the frequency dictionary. On the hard slice โ€” the typos a nearest-dictionary lookup gets wrong โ€” the model recovers 68.7% where the frequency dictionary manages only 29.8%. A clean must-beat-a-script result with no asterisk. Out-of-vocab words: 14% โ€” the prior is the ~480-word training vocabulary; reported, not hidden.

Usage

pip install torch safetensors numpy
# grab modeling_nano_spell.py + config.json from the GitHub repo
from modeling_nano_spell import load, correct
m = load("model.safetensors", "config.json")
correct(m, "recieve")   # -> "receive"
correct(m, "teh")       # -> "the"
correct(m, "world")     # -> "world"  (already correct, left alone)

How it was trained

Code-generated data: sample a real word from a fixed, frequency-ordered ~480-word vocabulary (Zipf-weighted), inject 1โ€“2 realistic typos (delete / insert / keyboard-neighbour / transpose / double), ~15% identity. Label correct by construction. SFT, prompt masked. ~1M-param byte-level transformer (RMSNorm, RoPE, GQA, SwiGLU), 12k steps, AdamW, cosine LR. Full recipe and reproduction in the GitHub repo.

MIT. Built by Vuk Rosiฤ‡.

Downloads last month
1
Safetensors
Model size
1.02M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support