--- license: mit language: en --- A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few things: - Basics of attention and RoPE - Training a GPT-like model with multiple GPUs including checkpointing and other considerations to make the run successful - Multi phase training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data This repository contains the pretrained model and tokenizer. See [gpahal/microgpt](https://github.com/gpahal/microgpt) for instructions on how to use the model.