---
language:
- en
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
- tiny-model
- educational
- record-breaker
- ultra-small
- smallest-llm
- 80k-parameters
---

# TinyBuddy-80K

> 🏆 **RECORD ATTEMPT**: The smallest functional English-speaking language model on Hugging Face.
> **83,856 parameters** — that's ~84K, beating the NaA-IA/Small-ever record by being both tiny AND coherent.

**Mission**: Prove that under 100K parameters, a language model can still learn English patterns and generate recognizable text. This is not just the smallest — it's the smallest that *works*.

---

## Model Details

| Property | Value |
|---|---|
| **Parameters** | **83,856** (~84K) |
| Layers | 1 |
| Hidden size | 48 |
| Attention heads | 4 (query) / 2 (key-value) = GQA |
| FF intermediate size | 192 |
| Context length | 128 |
| Vocabulary | 1,024 tokens (BPE) |
| Architecture | Llama-style: RMSNorm, RoPE, SiLU/SwiGLU, tied embeddings |
| Precision | float32 |

### Parameter Breakdown

| Component | Parameters |
|---|---|
| Token Embedding (tied) | 49,152 |
| Attention (Q/K/V/O) | 5,760 |
| FeedForward (Gate/Up/Down) | 27,648 |
| LayerNorm (3× RMSNorm) | 144 |
| **Total** | **83,856** |

---

## Architecture

TinyBuddy-100K uses a **single transformer block** with:

- **RMSNorm** (pre-norm) — efficient normalization
- **Grouped Query Attention** — 4 query heads, 2 KV heads (saves params)
- **RoPE** (Rotary Position Embeddings) — relative position encoding
- **SwiGLU** (SiLU-gated MLP) — modern activation
- **Tied embeddings** — input and output share weights (saves ~49K params!)

```
Input → Embedding → [RMSNorm → GQA Attention → +] → [RMSNorm → SwiGLU FFN → +] → RMSNorm → LM Head → Output
```

---

## Training

- **Dataset**: TinyStories (~5,000 stories)
- **Tokenizer**: Byte-level BPE, 1,024 vocabulary (trained from scratch)
- **Optimizer**: AdamW (lr=5e-3, weight_decay=0.1)
- **Schedule**: Warmup (50 steps) + Cosine decay
- **Steps**: 1,000 on CPU
- **Hardware**: Single CPU core (the challenge!)

---

## Usage

```python
import torch
from model import create_model

# Load config
import json
with open("config.json") as f:
    config = json.load(f)

# Create model
model = create_model(config)
model.load_state_dict(torch.load("output/model.pt", map_location="cpu"))
model.eval()

# Generate
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("data/tokenizer.json")

prompt = "Once upon a time,"
encoded = tokenizer.encode(prompt)
ids = [1] + encoded.ids  # Add BOS
input_ids = torch.tensor([ids], dtype=torch.long)

output_ids = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=40)
print(tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True))
```

---

## Limitations

This model is **extremely small** — it has fewer parameters than a 28×28 grayscale image.

**What works:**
- Basic word patterns and short phrases
- Recognizable English-like structure
- Story-like opening sentences

**What's broken:**
- Very limited coherence (1–2 sentences max)
- High repetition
- No factual knowledge or reasoning
- Limited vocabulary diversity

This model exists purely to explore the **lower bounds of language modeling**. It proves that even at 84K parameters, a neural network can capture statistical patterns in English text.

---

## The Record

| Model | Parameters | Speaks English? |
|---|---|---|
| NaA-IA/Small-ever | 112 | ❌ No |
| **TinyBuddy-80K** | **83,856** | **✅ YES** |

TinyBuddy-100K may not be the absolute smallest model ever, but **it's the smallest that actually generates recognizable English text**. That's the real achievement.

---

## Citation

```bibtex
@misc{tinybuddy100k,
  title  = {TinyBuddy-100K: An 84K parameter Llama-style model that speaks English},
  year   = {2026},
  note   = {Record attempt: smallest functional English text generator.}
}
```

**LONG LIVE TINYBUDDY-80K** 🚀