--- language: - en license: mit library_name: transformers pipeline_tag: text-generation tags: - tiny-model - educational - record-breaker - ultra-small - smallest-llm - 80k-parameters --- # TinyBuddy-80K > 🏆 **RECORD ATTEMPT**: The smallest functional English-speaking language model on Hugging Face. > **83,856 parameters** — that's ~84K, beating the NaA-IA/Small-ever record by being both tiny AND coherent. **Mission**: Prove that under 100K parameters, a language model can still learn English patterns and generate recognizable text. This is not just the smallest — it's the smallest that *works*. --- ## Model Details | Property | Value | |---|---| | **Parameters** | **83,856** (~84K) | | Layers | 1 | | Hidden size | 48 | | Attention heads | 4 (query) / 2 (key-value) = GQA | | FF intermediate size | 192 | | Context length | 128 | | Vocabulary | 1,024 tokens (BPE) | | Architecture | Llama-style: RMSNorm, RoPE, SiLU/SwiGLU, tied embeddings | | Precision | float32 | ### Parameter Breakdown | Component | Parameters | |---|---| | Token Embedding (tied) | 49,152 | | Attention (Q/K/V/O) | 5,760 | | FeedForward (Gate/Up/Down) | 27,648 | | LayerNorm (3× RMSNorm) | 144 | | **Total** | **83,856** | --- ## Architecture TinyBuddy-100K uses a **single transformer block** with: - **RMSNorm** (pre-norm) — efficient normalization - **Grouped Query Attention** — 4 query heads, 2 KV heads (saves params) - **RoPE** (Rotary Position Embeddings) — relative position encoding - **SwiGLU** (SiLU-gated MLP) — modern activation - **Tied embeddings** — input and output share weights (saves ~49K params!) ``` Input → Embedding → [RMSNorm → GQA Attention → +] → [RMSNorm → SwiGLU FFN → +] → RMSNorm → LM Head → Output ``` --- ## Training - **Dataset**: TinyStories (~5,000 stories) - **Tokenizer**: Byte-level BPE, 1,024 vocabulary (trained from scratch) - **Optimizer**: AdamW (lr=5e-3, weight_decay=0.1) - **Schedule**: Warmup (50 steps) + Cosine decay - **Steps**: 1,000 on CPU - **Hardware**: Single CPU core (the challenge!) --- ## Usage ```python import torch from model import create_model # Load config import json with open("config.json") as f: config = json.load(f) # Create model model = create_model(config) model.load_state_dict(torch.load("output/model.pt", map_location="cpu")) model.eval() # Generate from tokenizers import Tokenizer tokenizer = Tokenizer.from_file("data/tokenizer.json") prompt = "Once upon a time," encoded = tokenizer.encode(prompt) ids = [1] + encoded.ids # Add BOS input_ids = torch.tensor([ids], dtype=torch.long) output_ids = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=40) print(tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)) ``` --- ## Limitations This model is **extremely small** — it has fewer parameters than a 28×28 grayscale image. **What works:** - Basic word patterns and short phrases - Recognizable English-like structure - Story-like opening sentences **What's broken:** - Very limited coherence (1–2 sentences max) - High repetition - No factual knowledge or reasoning - Limited vocabulary diversity This model exists purely to explore the **lower bounds of language modeling**. It proves that even at 84K parameters, a neural network can capture statistical patterns in English text. --- ## The Record | Model | Parameters | Speaks English? | |---|---|---| | NaA-IA/Small-ever | 112 | ❌ No | | **TinyBuddy-80K** | **83,856** | **✅ YES** | TinyBuddy-100K may not be the absolute smallest model ever, but **it's the smallest that actually generates recognizable English text**. That's the real achievement. --- ## Citation ```bibtex @misc{tinybuddy100k, title = {TinyBuddy-100K: An 84K parameter Llama-style model that speaks English}, year = {2026}, note = {Record attempt: smallest functional English text generator.} } ``` **LONG LIVE TINYBUDDY-80K** 🚀