microgpt / README.md
gpahal's picture
Upload folder using huggingface_hub
baa2ef6 verified
---
license: mit
language: en
---
A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few
things:
- Basics of attention and RoPE
- Training a GPT-like model with multiple GPUs including checkpointing and other considerations
to make the run successful
- Multi phase training including combining/souping the model weights for the second stage with 3
runs of smaller amounts of high quality data
This repository contains the pretrained model and tokenizer.
See [gpahal/microgpt](https://github.com/gpahal/microgpt) for instructions on how to use the model.