--- license: apache-2.0 language: - en tags: - text-generation - gpt - from-scratch - tinystories pipeline_tag: text-generation --- # mini-llm-tinystories An 18.3M-parameter GPT-2-style language model **trained completely from scratch on CPU** (no pretrained weights). It generates short, coherent children's-story-style English. ## Details - **Architecture:** GPT-2 style decoder (pre-norm, GELU, weight-tied head) - **Params:** ~18.3M — 448 dim, 7 heads, 6 layers, 256 context - **Tokenizer:** byte-level BPE, 8192 vocab - **Training data:** TinyStories (~90M tokens) + a small amount of Alpaca Q&A - **Training:** ~7.6 hours on 2 CPU cores, final train loss ~1.86 - **Type:** base **completion** model (continues text; not instruction-tuned) ## Usage ```python import torch, json from safetensors.torch import load_file from tokenizers import ByteLevelBPETokenizer from gpt2 import GPT2 # include gpt2.py from this repo cfg = json.load(open("config.json")) model = GPT2(cfg) model.load_state_dict(load_file("model.safetensors")) model.eval() tk = ByteLevelBPETokenizer("tokenizer_bpe/vocab.json", "tokenizer_bpe/merges.txt") ids = tk.encode("Once upon a time").ids out = model.generate(torch.tensor([ids]), max_new_tokens=150) print(tk.decode(out[0].tolist())) ``` ## Limitations Tiny model with no real-world knowledge. Best at short narrative completions in a TinyStories style. Will produce fluent but factually-wrong text if asked questions.