abhinavv3
/

GPT-124m

Model card Files Files and versions

xet

Community

abhinavv3 commited on Jun 16, 2025

Commit

cfca1f4

verified ·

1 Parent(s): 5006238

Update README.md

Browse files

Files changed (1) hide show

README.md +168 -0

README.md CHANGED Viewed

	@@ -0,0 +1,168 @@

+# MEMGPT
+A GPT-2-style large language model (LLM) repository.This implementation includes full support for distributed training, sharded datasets, benchmark evaluation, and efficient text generation.
+---
+## 🔧 Features
+- Transformer architecture based on GPT-2.
+- Configurable training and model hyperparameters via JSON.
+- Sharded dataset loading from `.npy` files.
+- Mixed-precision training with `torch.autocast`.
+- DDP (DistributedDataParallel) support.
+- Evaluation support with HellaSwag.
+- Modular codebase for easy extensibility.
+---
+## 📁 Project Structure
+```bash
+MEMGPT/
+├── configs/
+│   └── config.json                    # Model and training configuration
+│
+├── data/
+│   ├── edu_fineweb/                   # Sharded training data
+│   │   ├── train_000001.npy
+│   │   ├── train_000002.npy
+│   │   └── test_000001.npy
+│   ├── hellaswag/
+│   │   └── hellaswag_val.jsonl
+│   └── fineweb.py                     # Dataset sharding/processing logic
+│
+├── model_core/
+│   ├── __init__.py
+│   ├── attention.py                   # Self-attention module
+│   ├── model.py                       # GPT2 model architecture
+│   ├── dataloader.py                  # DataLoader_1 class
+│   └── training.py                    # train_nanogpt function
+│
+├── scripts/
+│   ├── train.py                       # Entry point to start training
+│   ├── evaluate.py                    # Run evaluation
+│   └── generate.py                    # Generate text from trained model
+│
+├── evaluation/
+│   ├── __init__.py
+│   ├── hellaswag.py                   # HellaSwag dataset preparation
+│   └── val_hellaswag.py               # HellaSwag scoring function
+│
+├── logs/
+│   ├── log.txt                        # Training log file
+│   └── model_xxxxx.pt                # Checkpoint files
+│
+├── .gitignore
+├── README.md
+├── requirements.txt
+```
+---
+## ⚙️ Configuration
+Edit `configs/config.json` to configure your model and training setup.
+Example:
+```json
+{
+  "model": {
+    "block_size": 1024,
+    "vocab_size": 50304,
+    "n_layer": 12,
+    "n_head": 12,
+    "n_embd": 768
+  },
+  "training": {
+    "max_steps": 19073,
+    "log_dir": "log",
+    "total_batch_size": 524288,
+    "B": 64,
+    "T": 1024,
+    "max_lr": 0.0006,
+    "min_lr": 0.00006,
+    "warmup_steps": 715,
+    "weight_decay": 0.1,
+    "learning_rate": 0.0006
+  }
+}
+```
+---
+## 🚀 Training
+To start training the model:
+```bash
+python scripts/train.py
+```
+This script internally loads `train_nanogpt()` from `model_core/training.py` using the config in `configs/config.json`.
+### Optional: Distributed Training
+To run training across multiple GPUs using PyTorch DDP:
+```bash
+torchrun --nproc_per_node=NUM_GPUS scripts/train.py
+```
+Replace `NUM_GPUS` with the number of GPUs you want to use.
+---
+## 📊 Evaluation
+To evaluate on HellaSwag:
+```bash
+python scripts/evaluate.py
+```
+Make sure the `hellaswag_val.jsonl` file is available under `data/hellaswag/`.
+---
+## ✍️ Text Generation
+To generate text from a trained model:
+```bash
+python scripts/generate.py
+```
+Make sure to adjust the generation script to point to the correct checkpoint under the `logs/` directory.
+---
+## 🧩 Requirements
+Install required packages:
+```bash
+pip install -r requirements.txt
+```
+---
+## 📌 Notes
+- Ensure your `.npy` sharded data is placed under `data/edu_fineweb/`.
+- The log directory and checkpoints will be saved in `logs/`.
+- The `DataLoader_1` handles distributed data loading.
+- Supports `bfloat16` autocasting for better training efficiency.
+---
+## 📮 License
+MIT License. Feel free to modify and build upon this for research or commercial use.
+---
+## 🙌 Acknowledgements
+Inspired by Andrej Karpathy's nanoGPT. Special thanks to the Andrej Karpathy Youtube tutorials and open-source AI community.