PicoLM-15M

A 19M parameter GPT-2 style causal language model pretrained from scratch on a mix of TinyStories and FineWeb web data. Trained in ~45 minutes on a single NVIDIA T4 GPU.

Model Details

Property Value
Architecture GPT-2 (decoder-only transformer)
Parameters ~19M
Context length 512 tokens
Vocabulary size 49,152
Layers 8
Attention heads 8
Hidden size 256
FFN size 1024
Tokenizer SmolLM2-135M (HuggingFaceTB)
Training steps 8,000
Final loss ~3.6–4.2

Training

Hardware: Google Colab, NVIDIA T4 (15GB VRAM)

Dataset mix:

  • 75% TinyStories — simple English stories
  • 25% FineWeb (sample-10BT) — deduplicated Common Crawl web text

Training config:

  • Optimizer: AdamW (lr=3e-4, weight_decay=0.1)
  • LR schedule: Cosine with 400 warmup steps
  • Batch size: 16 × 2 grad accum = effective batch 32
  • Mixed precision: fp16
  • Streaming: yes (no full dataset download)

Usage

from transformers import AutoTokenizer, GPT2LMHeadModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("Tralalabs/PicoLM-15M")
model = GPT2LMHeadModel.from_pretrained("Tralalabs/PicoLM-15M")

gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
output = gen("Once upon a time", max_new_tokens=100, do_sample=True, temperature=0.8)
print(output[0]["generated_text"])

Sample Outputs

Prompt: Once upon a time

Once upon a time, there was a little girl named Lily. She loved to play outside and play with her ball. One day, she's friend Lily came to play outside...

Prompt: The history of the internet

The history of the internet. And the new world we have found in the last year of 110 in the world. The group of the people from the American leaders...

Prompt: Artificial intelligence is

Artificial intelligence is not good, but not even not yet in order to bring on the world of the world...

Limitations

  • Small scale (19M params) — outputs are often repetitive or incoherent on complex prompts
  • Not instruction-tuned — this is a base pretrained model only
  • Undertrained relative to Chinchilla optimal (~300M tokens seen vs ~570M recommended)
  • Best suited for simple narrative/story generation due to TinyStories bias

Intended Use

  • Educational — learning how pretraining works
  • Baseline for fine-tuning experiments
  • Research on small language model behavior

Future Plans

  • PicoLM-15M-v2 with more steps (12,000) and better LR schedule
  • Instruction fine-tuning variant

Citation

@misc{picolm2026,
  author = {Tralalabs},
  title = {PicoLM-15M: A Small GPT-style Language Model},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Tralalabs/PicoLM-15M}
}
Downloads last month
68
Safetensors
Model size
19M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Tralalabs/PicoLM-15M

Collection including Tralalabs/PicoLM-15M