Tiny GPT

Tiny GPT is an educational decoder-only Transformer trained from scratch on the TinyStories dataset. The implementation is intentionally small and readable.

Model details

Architecture: decoder-only causal language model
Context length: 512 tokens
Vocabulary size: 10,000
Hidden size: 256
Transformer layers: 6
Attention heads: 8

Source code: https://github.com/alainbrown/tiny-gpt

Usage

This repository contains custom Transformers code. Review it before enabling trust_remote_code.

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "alainbrown/tiny-gpt"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)

inputs = tokenizer("Once upon a time", return_tensors="pt")
logits = model(**inputs).logits

Intended use

This model is intended for education and experimentation. It is not intended for production, factual question answering, or safety-critical applications.

Limitations

The model is small, trained on synthetic children's stories, and has not been comprehensively evaluated. It may produce incoherent, repetitive, incorrect, or inappropriate text. English is the only supported language.

Training

The training pipeline is available in the linked GitHub repository. This model repository excludes optimizer and progress state and contains inference files only.

Downloads last month: 59

Safetensors

Model size

24.3M params

Tensor type

F32

alainbrown
/

tiny-gpt