| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - text-generation |
| - gpt |
| - from-scratch |
| - tinystories |
| pipeline_tag: text-generation |
| --- |
| |
| # mini-llm-tinystories |
|
|
| An 18.3M-parameter GPT-2-style language model **trained completely from scratch on CPU** |
| (no pretrained weights). It generates short, coherent children's-story-style English. |
|
|
| ## Details |
| - **Architecture:** GPT-2 style decoder (pre-norm, GELU, weight-tied head) |
| - **Params:** ~18.3M — 448 dim, 7 heads, 6 layers, 256 context |
| - **Tokenizer:** byte-level BPE, 8192 vocab |
| - **Training data:** TinyStories (~90M tokens) + a small amount of Alpaca Q&A |
| - **Training:** ~7.6 hours on 2 CPU cores, final train loss ~1.86 |
| - **Type:** base **completion** model (continues text; not instruction-tuned) |
|
|
| ## Usage |
| ```python |
| import torch, json |
| from safetensors.torch import load_file |
| from tokenizers import ByteLevelBPETokenizer |
| from gpt2 import GPT2 # include gpt2.py from this repo |
| |
| cfg = json.load(open("config.json")) |
| model = GPT2(cfg) |
| model.load_state_dict(load_file("model.safetensors")) |
| model.eval() |
| |
| tk = ByteLevelBPETokenizer("tokenizer_bpe/vocab.json", "tokenizer_bpe/merges.txt") |
| ids = tk.encode("Once upon a time").ids |
| out = model.generate(torch.tensor([ids]), max_new_tokens=150) |
| print(tk.decode(out[0].tolist())) |
| ``` |
|
|
| ## Limitations |
| Tiny model with no real-world knowledge. Best at short narrative completions in a |
| TinyStories style. Will produce fluent but factually-wrong text if asked questions. |
|
|