---
license: mit
language: en
---

A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few
things:

- Basics of attention and RoPE
- Training a GPT-like model with multiple GPUs including checkpointing and other considerations
to make the run successful
- Multi phase training including combining/souping the model weights for the second stage with 3
runs of smaller amounts of high quality data

This repository contains the pretrained model and tokenizer.

See [gpahal/microgpt](https://github.com/gpahal/microgpt) for instructions on how to use the model.