| license: mit | |
| language: en | |
| A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few | |
| things: | |
| - Basics of attention and RoPE | |
| - Training a GPT-like model with multiple GPUs including checkpointing and other considerations | |
| to make the run successful | |
| - Multi phase training including combining/souping the model weights for the second stage with 3 | |
| runs of smaller amounts of high quality data | |
| This repository contains the pretrained model and tokenizer. | |
| See [gpahal/microgpt](https://github.com/gpahal/microgpt) for instructions on how to use the model. | |