PicoLM-15M

A 19M parameter GPT-2 style causal language model pretrained from scratch on a mix of TinyStories and FineWeb web data. Trained in ~45 minutes on a single NVIDIA T4 GPU.

Model Details

Property	Value
Architecture	GPT-2 (decoder-only transformer)
Parameters	~19M
Context length	512 tokens
Vocabulary size	49,152
Layers	8
Attention heads	8
Hidden size	256
FFN size	1024
Tokenizer	SmolLM2-135M (HuggingFaceTB)
Training steps	8,000
Final loss	~3.6–4.2

Training

Hardware: Google Colab, NVIDIA T4 (15GB VRAM)

Dataset mix:

75% TinyStories — simple English stories
25% FineWeb (sample-10BT) — deduplicated Common Crawl web text

Training config:

Optimizer: AdamW (lr=3e-4, weight_decay=0.1)
LR schedule: Cosine with 400 warmup steps
Batch size: 16 × 2 grad accum = effective batch 32
Mixed precision: fp16
Streaming: yes (no full dataset download)

Usage

from transformers import AutoTokenizer, GPT2LMHeadModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("Tralalabs/PicoLM-15M")
model = GPT2LMHeadModel.from_pretrained("Tralalabs/PicoLM-15M")

gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
output = gen("Once upon a time", max_new_tokens=100, do_sample=True, temperature=0.8)
print(output[0]["generated_text"])

Sample Outputs

Prompt: Once upon a time

Once upon a time, there was a little girl named Lily. She loved to play outside and play with her ball. One day, she's friend Lily came to play outside...

Prompt: The history of the internet

The history of the internet. And the new world we have found in the last year of 110 in the world. The group of the people from the American leaders...

Prompt: Artificial intelligence is

Artificial intelligence is not good, but not even not yet in order to bring on the world of the world...

Limitations

Small scale (19M params) — outputs are often repetitive or incoherent on complex prompts
Not instruction-tuned — this is a base pretrained model only
Undertrained relative to Chinchilla optimal (~300M tokens seen vs ~570M recommended)
Best suited for simple narrative/story generation due to TinyStories bias

Intended Use

Educational — learning how pretraining works
Baseline for fine-tuning experiments
Research on small language model behavior

Future Plans

PicoLM-15M-v2 with more steps (12,000) and better LR schedule
Instruction fine-tuning variant

Citation

@misc{picolm2026,
  author = {Tralalabs},
  title = {PicoLM-15M: A Small GPT-style Language Model},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Tralalabs/PicoLM-15M}
}

Downloads last month: 17

Safetensors

Model size

19M params

Tensor type

F32

Datasets used to train Tralalabs/PicoLM-15M

Space using Tralalabs/PicoLM-15M 1

Collection including Tralalabs/PicoLM-15M

PicoLM

Collection

The PicoLM family made by Tralalabs • 2 items • Updated Mar 10