pico-lm
/

pico-decoder-large

Text Generation

Model card Files Files and versions

pico-decoder-large / README.md

rdiehlmartinez's picture

Update README.md

81ed588 verified 11 months ago

|

2.05 kB

	---
	license: apache-2.0
	datasets:
	- pico-lm/pretokenized-dolma
	language:
	- en
	metrics:
	- pico-lm/perplexity
	pipeline_tag: text-generation
	---

	# Pico Decoder Large

	pico-decoder-large is the largest model (570M) in the current `pico-decoder` suite. It is a full-scale research model designed for in-depth interpretability studies of transformer learning. Trained with [`pico-train`](https://github.com/pico-lm) and fully compatible with [`pico-analyze`](https://github.com/pico-lm), it offers rich checkpointing and analytical insight into large-scale LM behavior.

	## 🔧 Model Details

	\| Field \| Value \|
	\|---------------------\|------------------------------------\|
	\| Architecture \| Decoder-only transformer (LLaMA-style) \|
	\| Parameters \| 570M \|
	\| Layers \| 12 \|
	\| Hidden Size \| 1536 \|
	\| Feed Forward Size\| 6144 \|
	\| Attention Heads \| 12 \|
	\| Key/Value Heads \| 4 \|

	## 📚 Training

	- Dataset: [`pretokenized-dolma`](https://github.com/pico-lm)
	- Training steps: 200,000
	- Batch size: 1024
	- Sequence length: 2048
	- Optimizer: AdamW
	- Learning rate schedule: Linear decay with warmup
	- Compute: 16 A100-SXM4-80GB GPUs

	## 📈 Evaluation and Analysis

	This model supports fine-grained analysis using [pico-analyze](https://github.com/pico-lm). This tool enables researchers to understand how learning unfolds over training, even at very small scales.

	We also evaluate perplexity of the model on the [pico-paloma-tinsy](https://huggingface.co/datasets/pico-lm/pretokenized-paloma-tinsy) dataset.

	## 📄 Citation

	```bibtex
	@software{pico2025,
	author = {Diehl Martinez, Richard},
	title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics},
	year = {2025},
	url = {https://github.com/pico-lm}
	}