TinyBuddy-500K

⚠️ Educational / experimental model. TinyBuddy-500K is a from-scratch tiny Llama-style language model (~547K parameters) trained on a synthetic slice of TinyStories-style text. It is not a useful assistant — it is a working demonstration of training extremely small models from scratch. See the Limitations section.

Model description

TinyBuddy-500K is a very small decoder-only Transformer language model trained on synthetic children's stories in the style of TinyStories. The architecture follows the LLaMA design (RMSNorm, Grouped Query Attention, SiLU MLP, tied embeddings).

Hyperparameter Value
Parameters 547,296 (~547K)
Layers 2
Attention heads 4
Key-Value heads (GQA) 2
Hidden size 96
MLP intermediate size 384
Context length 512
Vocab size 2,048 (BPE trained from scratch)
Norm RMSNorm
Activation SiLU
Position embeddings Learned absolute
Weight tying Yes (tied embeddings)
Precision float32

Training details

  • Data: Synthetic TinyStories-style corpus (~128K tokens)
  • Tokenizer: Custom byte-level BPE with 2048 vocabulary
  • Optimizer: AdamW
  • Steps: ~300 steps on CPU
  • Hardware: Single CPU core
  • Final loss: ~0.17

Usage

This model uses custom modeling code, so you must pass trust_remote_code=True.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "Eeppa/TinyBuddy-500K"

tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
model.eval()

prompt = "Once upon a time, there was a little girl named Lily."
input_ids = tokenizer.encode(prompt, return_tensors="pt")

out = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=50)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Limitations

This model is extremely small and was trained for a very short time on limited data.

What works:

  • Basic English patterns and short sentence structure
  • Simple story-like generation

What's broken:

  • Very limited coherence (usually breaks after 1–2 sentences)
  • High repetition
  • Poor long-range consistency
  • No real reasoning or factual knowledge

This model exists purely for educational purposes to explore the lower limits of language model size.

License

MIT

Citation

@misc{tinybuddy500k,
  title  = {TinyBuddy-500K: An educational ~500K parameter Llama-style model trained on TinyStories},
  year   = {2026},
  note   = {Educational demonstration of extremely small language models.}
}

Built with Llama.

Downloads last month
14
Safetensors
Model size
744k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Eeppa/TinyBuddy-500K