File size: 597 Bytes
baa2ef6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
license: mit
language: en
---

A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few
things:

- Basics of attention and RoPE
- Training a GPT-like model with multiple GPUs including checkpointing and other considerations
to make the run successful
- Multi phase training including combining/souping the model weights for the second stage with 3
runs of smaller amounts of high quality data

This repository contains the pretrained model and tokenizer.

See [gpahal/microgpt](https://github.com/gpahal/microgpt) for instructions on how to use the model.