RecursiveComplete / HF_README.md
Gentraxyz's picture
Upload folder using huggingface_hub
3c38b94 verified
|
Raw
History Blame Contribute Delete
1.46 kB
metadata
license: apache-2.0
language:
  - en
tags:
  - text-generation
  - gpt
  - from-scratch
  - tinystories
pipeline_tag: text-generation

mini-llm-tinystories

An 18.3M-parameter GPT-2-style language model trained completely from scratch on CPU (no pretrained weights). It generates short, coherent children's-story-style English.

Details

  • Architecture: GPT-2 style decoder (pre-norm, GELU, weight-tied head)
  • Params: ~18.3M — 448 dim, 7 heads, 6 layers, 256 context
  • Tokenizer: byte-level BPE, 8192 vocab
  • Training data: TinyStories (~90M tokens) + a small amount of Alpaca Q&A
  • Training: ~7.6 hours on 2 CPU cores, final train loss ~1.86
  • Type: base completion model (continues text; not instruction-tuned)

Usage

import torch, json
from safetensors.torch import load_file
from tokenizers import ByteLevelBPETokenizer
from gpt2 import GPT2   # include gpt2.py from this repo

cfg = json.load(open("config.json"))
model = GPT2(cfg)
model.load_state_dict(load_file("model.safetensors"))
model.eval()

tk = ByteLevelBPETokenizer("tokenizer_bpe/vocab.json", "tokenizer_bpe/merges.txt")
ids = tk.encode("Once upon a time").ids
out = model.generate(torch.tensor([ids]), max_new_tokens=150)
print(tk.decode(out[0].tolist()))

Limitations

Tiny model with no real-world knowledge. Best at short narrative completions in a TinyStories style. Will produce fluent but factually-wrong text if asked questions.