metadata
license: apache-2.0
language:
- en
tags:
- text-generation
- gpt
- from-scratch
- tinystories
pipeline_tag: text-generation
mini-llm-tinystories
An 18.3M-parameter GPT-2-style language model trained completely from scratch on CPU (no pretrained weights). It generates short, coherent children's-story-style English.
Details
- Architecture: GPT-2 style decoder (pre-norm, GELU, weight-tied head)
- Params: ~18.3M — 448 dim, 7 heads, 6 layers, 256 context
- Tokenizer: byte-level BPE, 8192 vocab
- Training data: TinyStories (~90M tokens) + a small amount of Alpaca Q&A
- Training: ~7.6 hours on 2 CPU cores, final train loss ~1.86
- Type: base completion model (continues text; not instruction-tuned)
Usage
import torch, json
from safetensors.torch import load_file
from tokenizers import ByteLevelBPETokenizer
from gpt2 import GPT2 # include gpt2.py from this repo
cfg = json.load(open("config.json"))
model = GPT2(cfg)
model.load_state_dict(load_file("model.safetensors"))
model.eval()
tk = ByteLevelBPETokenizer("tokenizer_bpe/vocab.json", "tokenizer_bpe/merges.txt")
ids = tk.encode("Once upon a time").ids
out = model.generate(torch.tensor([ids]), max_new_tokens=150)
print(tk.decode(out[0].tolist()))
Limitations
Tiny model with no real-world knowledge. Best at short narrative completions in a TinyStories style. Will produce fluent but factually-wrong text if asked questions.