| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - pico-lm/pretokenized-dolma |
| | language: |
| | - en |
| | metrics: |
| | - pico-lm/perplexity |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Pico Decoder Small |
| |
|
| | **pico-decoder-small** is a 65M parameter model in the `pico-decoder` suite β a lightweight, LLaMA-style decoder-only transformer trained from scratch using [`pico-train`](https://github.com/pico-lm/pico-train). It is designed for transparent and reproducible research into the learning dynamics of language models, and is fully compatible with the `pico-analyze` toolkit for detailed interpretability analysis. |
| |
|
| | ## π§ Model Details |
| |
|
| | | Field | Value | |
| | |---------------------|------------------------------------| |
| | | **Architecture** | Decoder-only transformer (LLaMA-style) | |
| | | **Parameters** | 65M | |
| | | **Layers** | 12 | |
| | | **Hidden Size** | 384 | |
| | | **Feed Foward Size** | 1536 | |
| | | **Attention Heads** | 12 | |
| | | **Key/Value Heads** | 4 | |
| |
|
| | ## π Training |
| |
|
| | - **Dataset**: [`pretokenized-dolma`](https://huggingface.co/datasets/pico-lm/pretokenized-dolma), English-only |
| | - **Training steps**: 200,000 |
| | - **Batch size**: 1024 |
| | - **Sequence length**: 2048 |
| | - **Optimizer**: AdamW |
| | - **Learning rate schedule**: Linear decay with warmup |
| | - **Compute**: 16 A100-SXM4-80GB GPUs |
| |
|
| | ## π Evaluation and Analysis |
| |
|
| | This model supports fine-grained analysis using [`pico-analyze`](https://github.com/pico-lm/pico-analyze). This tool enables researchers to understand how learning unfolds over training, even at modest scales. |
| |
|
| | We also evaluate perplexity of the model on the [`pico-paloma-tinsy`](https://huggingface.co/datasets/pico-lm/pretokenized-paloma-tinsy) dataset. |
| |
|
| | ## π Citation |
| |
|
| | If you use `pico-small` or any other `pico-decoder` model in your research, please cite: |
| |
|
| | ```bibtex |
| | @software{pico2025, |
| | author = {Diehl Martinez, Richard}, |
| | title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics}, |
| | year = {2025}, |
| | url = {https://github.com/pico-lm} |
| | } |