File size: 597 Bytes
baa2ef6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ---
license: mit
language: en
---
A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few
things:
- Basics of attention and RoPE
- Training a GPT-like model with multiple GPUs including checkpointing and other considerations
to make the run successful
- Multi phase training including combining/souping the model weights for the second stage with 3
runs of smaller amounts of high quality data
This repository contains the pretrained model and tokenizer.
See [gpahal/microgpt](https://github.com/gpahal/microgpt) for instructions on how to use the model.
|