TinyBuddy-500K

⚠️ Educational / experimental model. TinyBuddy-500K is a from-scratch tiny Llama-style language model (~547K parameters) trained on a synthetic slice of TinyStories-style text. It is not a useful assistant — it is a working demonstration of training extremely small models from scratch. See the Limitations section.

Model description

TinyBuddy-500K is a very small decoder-only Transformer language model trained on synthetic children's stories in the style of TinyStories. The architecture follows the LLaMA design (RMSNorm, Grouped Query Attention, SiLU MLP, tied embeddings).

Hyperparameter	Value
Parameters	547,296 (~547K)
Layers	2
Attention heads	4
Key-Value heads (GQA)	2
Hidden size	96
MLP intermediate size	384
Context length	512
Vocab size	2,048 (BPE trained from scratch)
Norm	RMSNorm
Activation	SiLU
Position embeddings	Learned absolute
Weight tying	Yes (tied embeddings)
Precision	float32

Training details

Data: Synthetic TinyStories-style corpus (~128K tokens)
Tokenizer: Custom byte-level BPE with 2048 vocabulary
Optimizer: AdamW
Steps: ~300 steps on CPU
Hardware: Single CPU core
Final loss: ~0.17

Usage

This model uses custom modeling code, so you must pass trust_remote_code=True.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "Eeppa/TinyBuddy-500K"

tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
model.eval()

prompt = "Once upon a time, there was a little girl named Lily."
input_ids = tokenizer.encode(prompt, return_tensors="pt")

out = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=50)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Limitations

This model is extremely small and was trained for a very short time on limited data.

What works:

Basic English patterns and short sentence structure
Simple story-like generation

What's broken:

Very limited coherence (usually breaks after 1–2 sentences)
High repetition
Poor long-range consistency
No real reasoning or factual knowledge

This model exists purely for educational purposes to explore the lower limits of language model size.

License

MIT

Citation

@misc{tinybuddy500k,
  title  = {TinyBuddy-500K: An educational ~500K parameter Llama-style model trained on TinyStories},
  year   = {2026},
  note   = {Educational demonstration of extremely small language models.}
}

Built with Llama.

Downloads last month: 2

Safetensors

Model size

744k params

Tensor type

F32

Eeppa
/

TinyBuddy-500K