| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - pico-lm/pretokenized-dolma |
| | language: |
| | - en |
| | metrics: |
| | - pico-lm/perplexity |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Pico Decoder Large |
| |
|
| | **pico-decoder-large** is the largest model (570M) in the current `pico-decoder` suite. It is a full-scale research model designed for in-depth interpretability studies of transformer learning. Trained with [`pico-train`](https://github.com/pico-lm) and fully compatible with [`pico-analyze`](https://github.com/pico-lm), it offers rich checkpointing and analytical insight into large-scale LM behavior. |
| |
|
| | ## π§ Model Details |
| |
|
| | | Field | Value | |
| | |---------------------|------------------------------------| |
| | | **Architecture** | Decoder-only transformer (LLaMA-style) | |
| | | **Parameters** | 570M | |
| | | **Layers** | 12 | |
| | | **Hidden Size** | 1536 | |
| | | **Feed Forward Size**| 6144 | |
| | | **Attention Heads** | 12 | |
| | | **Key/Value Heads** | 4 | |
| |
|
| | ## π Training |
| |
|
| | - **Dataset**: [`pretokenized-dolma`](https://github.com/pico-lm) |
| | - **Training steps**: 200,000 |
| | - **Batch size**: 1024 |
| | - **Sequence length**: 2048 |
| | - **Optimizer**: AdamW |
| | - **Learning rate schedule**: Linear decay with warmup |
| | - **Compute**: 16 A100-SXM4-80GB GPUs |
| |
|
| | ## π Evaluation and Analysis |
| |
|
| | This model supports fine-grained analysis using [pico-analyze](https://github.com/pico-lm). This tool enables researchers to understand how learning unfolds over training, even at very small scales. |
| |
|
| | We also evaluate perplexity of the model on the [pico-paloma-tinsy](https://huggingface.co/datasets/pico-lm/pretokenized-paloma-tinsy) dataset. |
| |
|
| | ## π Citation |
| |
|
| | ```bibtex |
| | @software{pico2025, |
| | author = {Diehl Martinez, Richard}, |
| | title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics}, |
| | year = {2025}, |
| | url = {https://github.com/pico-lm} |
| | } |
| | |