gpahal
/

microgpt

Model card Files Files and versions

microgpt / README.md

gpahal's picture

Upload folder using huggingface_hub

baa2ef6 verified 7 months ago

|

history blame contribute delete

597 Bytes

	---
	license: mit
	language: en
	---

	A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few
	things:

	- Basics of attention and RoPE
	- Training a GPT-like model with multiple GPUs including checkpointing and other considerations
	to make the run successful
	- Multi phase training including combining/souping the model weights for the second stage with 3
	runs of smaller amounts of high quality data

	This repository contains the pretrained model and tokenizer.

	See [gpahal/microgpt](https://github.com/gpahal/microgpt) for instructions on how to use the model.