saumilyajj
/

GTP-on-Reddit

Model card Files Files and versions

xet

Community

saumilyajj commited on Sep 12, 2025

Commit

a5986f2

verified ·

1 Parent(s): 47df44d

Update README.md

Browse files

Files changed (1) hide show

README.md +204 -204

README.md CHANGED Viewed

@@ -1,204 +1,204 @@
-# GPT from Scratch: A PyTorch Implementation
-A comprehensive implementation of GPT-style transformer models built from scratch using PyTorch. This project demonstrates the core concepts of transformer architecture, attention mechanisms, and language modeling through hands-on experimentation.
-## 🚀 Project Overview
-This repository contains:
-- **Two GPT implementations** with increasing complexity (GPTv1 and GPTv2)
-- **Parallel data processing pipeline** for OpenWebText dataset
-- **Character-level tokenization** system
-- **Training persistence and checkpointing**
-- **Complete experimentation workflow**
-## 📁 Project Structure
-```
-├── src/
-│   ├── data-extraction.py          # Full dataset processing (OpenWebText)
-│   └── data-extraction-2.py        # Sampled dataset processing (1% for quick iteration)
-├── notebooks/
-│   ├── GPTv1.ipynb                # Basic GPT transformer implementation
-│   ├── GPTv2.ipynb                # Enhanced GPT with training persistence
-│   └── ...                       # Additional experimental notebooks
-├── artifacts/
-│   ├── vocab.txt                  # Character vocabulary
-│   ├── training_data.json         # Training metrics and history
-│   ├── model-01.pkl              # Saved model checkpoint
-│   ├── output_train.txt          # Processed training data
-│   └── output_val.txt            # Processed validation data
-├── data/
-│   └── MNIST/                     # Standard datasets
-├── docs/
-│   └── .github/
-│       └── copilot-instructions.md  # AI agent guidelines
-├── gradio_app.py                 # Interactive web interface for text generation
-├── requirements.txt              # Project dependencies
-└── LICENSE                       # MIT License
-```
-## 🛠️ Installation
-1. **Clone the repository:**
-```bash
-git clone https://huggingface.co/YOUR_USERNAME/gpt-from-scratch
-cd gpt-from-scratch
-```
-2. **Install dependencies:**
-```bash
-pip install -r requirements.txt
-```
-3. **Download sample data:**
-```bash
-# Place your OpenWebText .xz files in the 'openwebtext' directory
-# Or use the provided wizard-of-oz.txt for quick testing
-```
-## 🏃‍♂️ Quick Start
-### 1. Data Processing
-For quick experimentation (1% sample):
-```bash
-python src/data-extraction-2.py
-```
-For full dataset processing:
-```bash
-python src/data-extraction.py
-```
-### 2. Model Training
-Open and run the Jupyter notebooks:
-**GPTv1 (Basic Implementation):**
-- Open `notebooks/GPTv1.ipynb`
-- Focuses on core transformer concepts
-- Uses wizard-of-oz.txt for training
-**GPTv2 (Advanced Implementation):**
-- Open `notebooks/GPTv2.ipynb`
-- Includes training persistence and better monitoring
-- Uses processed OpenWebText data
-- Memory-mapped file handling for large datasets
-### 3. Interactive Web Interface
-Launch the Gradio web interface for real-time text generation:
-```bash
-python gradio_app.py
-```
-**Features:**
-- 🎯 Real-time text generation with your trained model
-- 🌡️ Temperature control for creativity adjustment
-- 🎲 Seed control for reproducible results
-- 📊 Model information and architecture details
-- 💡 Pre-built example prompts to get started
-Access the interface at `http://localhost:7860` in your browser.
-## 🏗️ Architecture Details
-### Data Pipeline
-- **Parallel Processing**: Uses `ProcessPoolExecutor` for efficient .xz file handling
-- **Train/Validation Split**: 90/10 split with optional sampling
-- **Character-Level Tokenization**: Direct character-to-integer mapping
-- **Windows Compatibility**: Includes `freeze_support()` for multiprocessing
-### Model Architecture
-- **Multi-Head Attention**: Custom implementation with proper masking
-- **Feed-Forward Networks**: Standard transformer FFN with dropout
-- **Positional Embeddings**: Learned position encodings
-- **Layer Normalization**: Applied throughout the network
-### Key Hyperparameters
-```python
-block_size = 8        # Context window size
-batch_size = 128      # Training batch size
-n_embd = 384         # Embedding dimension
-n_head = 16/32       # Number of attention heads (varies by version)
-n_layer = 16/32      # Number of transformer layers
-dropout = 0.2        # Dropout rate
-learning_rate = 3e-4 # Learning rate
-```
-## 📊 Training Features
-- **Progress Tracking**: `tqdm` integration for real-time monitoring
-- **Training Persistence**: JSON-based training history (GPTv2)
-- **Model Checkpointing**: Pickle serialization for easy loading
-- **Evaluation Loops**: Separate training/validation evaluation
-- **Device Agnostic**: Automatic CUDA/CPU detection
-## 🔧 Usage Examples
-### Training a Model
-```python
-# Load and configure hyperparameters
-device = 'cuda' if torch.cuda.is_available() else 'cpu'
-# Initialize model
-model = GPTLanguageModel(vocab_size)
-model = model.to(device)
-# Train with monitoring
-for iter in range(max_iters):
-    # Training loop with loss tracking
-    # Automatic checkpointing every eval_iters
-```
-### Generating Text
-```python
-# Generate text from trained model
-context = torch.zeros((1, 1), dtype=torch.long, device=device)
-generated = decode(model.generate(context, max_new_tokens=500)[0].tolist())
-print(generated)
-```
-## 🤝 Contributing
-1. Fork the repository
-2. Create a feature branch (`git checkout -b feature/amazing-feature`)
-3. Commit your changes (`git commit -m 'Add amazing feature'`)
-4. Push to the branch (`git push origin feature/amazing-feature`)
-5. Open a Pull Request
-## 📄 License
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
-## 🙏 Acknowledgments
-- Inspired by Andrej Karpathy's "Let's build GPT" series
-- Based on the "Attention Is All You Need" paper
-- Uses OpenWebText dataset for training
-- Built with PyTorch framework
-## 📚 Learning Resources
-- [Attention Is All You Need Paper](https://arxiv.org/abs/1706.03762)
-- [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/)
-- [Andrej Karpathy's GPT Tutorial](https://www.youtube.com/watch?v=kCc8FmEb1nY)
----
-**Happy Learning! 🎓**

+# GPT from Scratch: A PyTorch Implementation
+A comprehensive implementation of GPT-style transformer models built from scratch using PyTorch. This project demonstrates the core concepts of transformer architecture, attention mechanisms, and language modeling through hands-on experimentation.
+## 🚀 Project Overview
+This repository contains:
+- **Two GPT implementations** with increasing complexity (GPTv1 and GPTv2)
+- **Parallel data processing pipeline** for OpenWebText dataset
+- **Character-level tokenization** system
+- **Training persistence and checkpointing**
+- **Complete experimentation workflow**
+## 📁 Project Structure
+```
+├── src/
+│   ├── data-extraction.py          # Full dataset processing (OpenWebText)
+│   └── data-extraction-2.py        # Sampled dataset processing (1% for quick iteration)
+├── notebooks/
+│   ├── GPTv1.ipynb                # Basic GPT transformer implementation
+│   ├── GPTv2.ipynb                # Enhanced GPT with training persistence
+│   └── ...                       # Additional experimental notebooks
+├── artifacts/
+│   ├── vocab.txt                  # Character vocabulary
+│   ├── training_data.json         # Training metrics and history
+│   ├── model-01.pkl              # Saved model checkpoint
+│   ├── output_train.txt          # Processed training data
+│   └── output_val.txt            # Processed validation data
+├── data/
+│   └── MNIST/                     # Standard datasets
+├── docs/
+│   └── .github/
+│       └── copilot-instructions.md  # AI agent guidelines
+├── gradio_app.py                 # Interactive web interface for text generation
+├── requirements.txt              # Project dependencies
+└── LICENSE                       # MIT License
+```
+## 🛠️ Installation
+1. **Clone the repository:**
+```bash
+git clone https://huggingface.co/saumilyajj/GTP-on-Reddit
+cd gpt-from-scratch
+```
+2. **Install dependencies:**
+```bash
+pip install -r requirements.txt
+```
+3. **Download sample data:**
+```bash
+# Place your OpenWebText .xz files in the 'openwebtext' directory
+# Or use the provided wizard-of-oz.txt for quick testing
+```
+## 🏃‍♂️ Quick Start
+### 1. Data Processing
+For quick experimentation (1% sample):
+```bash
+python src/data-extraction-2.py
+```
+For full dataset processing:
+```bash
+python src/data-extraction.py
+```
+### 2. Model Training
+Open and run the Jupyter notebooks:
+**GPTv1 (Basic Implementation):**
+- Open `notebooks/GPTv1.ipynb`
+- Focuses on core transformer concepts
+- Uses wizard-of-oz.txt for training
+**GPTv2 (Advanced Implementation):**
+- Open `notebooks/GPTv2.ipynb`
+- Includes training persistence and better monitoring
+- Uses processed OpenWebText data
+- Memory-mapped file handling for large datasets
+### 3. Interactive Web Interface
+Launch the Gradio web interface for real-time text generation:
+```bash
+python gradio_app.py
+```
+**Features:**
+- 🎯 Real-time text generation with your trained model
+- 🌡️ Temperature control for creativity adjustment
+- 🎲 Seed control for reproducible results
+- 📊 Model information and architecture details
+- 💡 Pre-built example prompts to get started
+Access the interface at `http://localhost:7860` in your browser.
+## 🏗️ Architecture Details
+### Data Pipeline
+- **Parallel Processing**: Uses `ProcessPoolExecutor` for efficient .xz file handling
+- **Train/Validation Split**: 90/10 split with optional sampling
+- **Character-Level Tokenization**: Direct character-to-integer mapping
+- **Windows Compatibility**: Includes `freeze_support()` for multiprocessing
+### Model Architecture
+- **Multi-Head Attention**: Custom implementation with proper masking
+- **Feed-Forward Networks**: Standard transformer FFN with dropout
+- **Positional Embeddings**: Learned position encodings
+- **Layer Normalization**: Applied throughout the network
+### Key Hyperparameters
+```python
+block_size = 8        # Context window size
+batch_size = 128      # Training batch size
+n_embd = 384         # Embedding dimension
+n_head = 16/32       # Number of attention heads (varies by version)
+n_layer = 16/32      # Number of transformer layers
+dropout = 0.2        # Dropout rate
+learning_rate = 3e-4 # Learning rate
+```
+## 📊 Training Features
+- **Progress Tracking**: `tqdm` integration for real-time monitoring
+- **Training Persistence**: JSON-based training history (GPTv2)
+- **Model Checkpointing**: Pickle serialization for easy loading
+- **Evaluation Loops**: Separate training/validation evaluation
+- **Device Agnostic**: Automatic CUDA/CPU detection
+## 🔧 Usage Examples
+### Training a Model
+```python
+# Load and configure hyperparameters
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+# Initialize model
+model = GPTLanguageModel(vocab_size)
+model = model.to(device)
+# Train with monitoring
+for iter in range(max_iters):
+    # Training loop with loss tracking
+    # Automatic checkpointing every eval_iters
+```
+### Generating Text
+```python
+# Generate text from trained model
+context = torch.zeros((1, 1), dtype=torch.long, device=device)
+generated = decode(model.generate(context, max_new_tokens=500)[0].tolist())
+print(generated)
+```
+## 🤝 Contributing
+1. Fork the repository
+2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+3. Commit your changes (`git commit -m 'Add amazing feature'`)
+4. Push to the branch (`git push origin feature/amazing-feature`)
+5. Open a Pull Request
+## 📄 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## 🙏 Acknowledgments
+- Inspired by Andrej Karpathy's "Let's build GPT" series
+- Based on the "Attention Is All You Need" paper
+- Uses OpenWebText dataset for training
+- Built with PyTorch framework
+## 📚 Learning Resources
+- [Attention Is All You Need Paper](https://arxiv.org/abs/1706.03762)
+- [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/)
+- [Andrej Karpathy's GPT Tutorial](https://www.youtube.com/watch?v=kCc8FmEb1nY)
+---
+**Happy Learning! 🎓**