Text Generation
Transformers
Safetensors
English
tinybuddy
tiny-model
educational
record-breaker
ultra-small
smallest-llm
80k-parameters
custom_code
Instructions to use Eeppa/TinyBuddy-80K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Eeppa/TinyBuddy-80K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Eeppa/TinyBuddy-80K", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Eeppa/TinyBuddy-80K", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Eeppa/TinyBuddy-80K with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Eeppa/TinyBuddy-80K" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-80K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Eeppa/TinyBuddy-80K
- SGLang
How to use Eeppa/TinyBuddy-80K with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-80K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-80K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-80K" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-80K", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Eeppa/TinyBuddy-80K with Docker Model Runner:
docker model run hf.co/Eeppa/TinyBuddy-80K
| language: | |
| - en | |
| license: mit | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - tiny-model | |
| - educational | |
| - record-breaker | |
| - ultra-small | |
| - smallest-llm | |
| - 80k-parameters | |
| # TinyBuddy-80K | |
| > π **RECORD ATTEMPT**: The smallest functional English-speaking language model on Hugging Face. | |
| > **83,856 parameters** β that's ~84K, beating the NaA-IA/Small-ever record by being both tiny AND coherent. | |
| **Mission**: Prove that under 100K parameters, a language model can still learn English patterns and generate recognizable text. This is not just the smallest β it's the smallest that *works*. | |
| --- | |
| ## Model Details | |
| | Property | Value | | |
| |---|---| | |
| | **Parameters** | **83,856** (~84K) | | |
| | Layers | 1 | | |
| | Hidden size | 48 | | |
| | Attention heads | 4 (query) / 2 (key-value) = GQA | | |
| | FF intermediate size | 192 | | |
| | Context length | 128 | | |
| | Vocabulary | 1,024 tokens (BPE) | | |
| | Architecture | Llama-style: RMSNorm, RoPE, SiLU/SwiGLU, tied embeddings | | |
| | Precision | float32 | | |
| ### Parameter Breakdown | |
| | Component | Parameters | | |
| |---|---| | |
| | Token Embedding (tied) | 49,152 | | |
| | Attention (Q/K/V/O) | 5,760 | | |
| | FeedForward (Gate/Up/Down) | 27,648 | | |
| | LayerNorm (3Γ RMSNorm) | 144 | | |
| | **Total** | **83,856** | | |
| --- | |
| ## Architecture | |
| TinyBuddy-100K uses a **single transformer block** with: | |
| - **RMSNorm** (pre-norm) β efficient normalization | |
| - **Grouped Query Attention** β 4 query heads, 2 KV heads (saves params) | |
| - **RoPE** (Rotary Position Embeddings) β relative position encoding | |
| - **SwiGLU** (SiLU-gated MLP) β modern activation | |
| - **Tied embeddings** β input and output share weights (saves ~49K params!) | |
| ``` | |
| Input β Embedding β [RMSNorm β GQA Attention β +] β [RMSNorm β SwiGLU FFN β +] β RMSNorm β LM Head β Output | |
| ``` | |
| --- | |
| ## Training | |
| - **Dataset**: TinyStories (~5,000 stories) | |
| - **Tokenizer**: Byte-level BPE, 1,024 vocabulary (trained from scratch) | |
| - **Optimizer**: AdamW (lr=5e-3, weight_decay=0.1) | |
| - **Schedule**: Warmup (50 steps) + Cosine decay | |
| - **Steps**: 1,000 on CPU | |
| - **Hardware**: Single CPU core (the challenge!) | |
| --- | |
| ## Usage | |
| ```python | |
| import torch | |
| from model import create_model | |
| # Load config | |
| import json | |
| with open("config.json") as f: | |
| config = json.load(f) | |
| # Create model | |
| model = create_model(config) | |
| model.load_state_dict(torch.load("output/model.pt", map_location="cpu")) | |
| model.eval() | |
| # Generate | |
| from tokenizers import Tokenizer | |
| tokenizer = Tokenizer.from_file("data/tokenizer.json") | |
| prompt = "Once upon a time," | |
| encoded = tokenizer.encode(prompt) | |
| ids = [1] + encoded.ids # Add BOS | |
| input_ids = torch.tensor([ids], dtype=torch.long) | |
| output_ids = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=40) | |
| print(tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)) | |
| ``` | |
| --- | |
| ## Limitations | |
| This model is **extremely small** β it has fewer parameters than a 28Γ28 grayscale image. | |
| **What works:** | |
| - Basic word patterns and short phrases | |
| - Recognizable English-like structure | |
| - Story-like opening sentences | |
| **What's broken:** | |
| - Very limited coherence (1β2 sentences max) | |
| - High repetition | |
| - No factual knowledge or reasoning | |
| - Limited vocabulary diversity | |
| This model exists purely to explore the **lower bounds of language modeling**. It proves that even at 84K parameters, a neural network can capture statistical patterns in English text. | |
| --- | |
| ## The Record | |
| | Model | Parameters | Speaks English? | | |
| |---|---|---| | |
| | NaA-IA/Small-ever | 112 | β No | | |
| | **TinyBuddy-80K** | **83,856** | **β YES** | | |
| TinyBuddy-100K may not be the absolute smallest model ever, but **it's the smallest that actually generates recognizable English text**. That's the real achievement. | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{tinybuddy100k, | |
| title = {TinyBuddy-100K: An 84K parameter Llama-style model that speaks English}, | |
| year = {2026}, | |
| note = {Record attempt: smallest functional English text generator.} | |
| } | |
| ``` | |
| **LONG LIVE TINYBUDDY-80K** π | |