microgpt / README.md
gpahal's picture
Upload folder using huggingface_hub
baa2ef6 verified
metadata
license: mit
language: en

A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few things:

  • Basics of attention and RoPE
  • Training a GPT-like model with multiple GPUs including checkpointing and other considerations to make the run successful
  • Multi phase training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data

This repository contains the pretrained model and tokenizer.

See gpahal/microgpt for instructions on how to use the model.