BaseGPT / README.md
skolvankar's picture
Upload README.md
355d257 verified
---
title: Shakespeare GPT
emoji: ๐ŸŽญ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
---
# Shakespeare GPT ๐ŸŽญ
A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio.
**Prepared by:** Shivranjan Kolvankar
## ๐Ÿ“– Overview
This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English.
## โœจ Features
- **From-scratch implementation** of GPT architecture (no pre-trained weights)
- **Character-level tokenization** (65-character vocabulary)
- **Gradio web interface** for interactive text generation
- **Custom model architecture** with configurable hyperparameters
- **Complete training pipeline** with notebook-based training script
## ๐Ÿ—๏ธ Model Architecture
The model follows the GPT-2 architecture with the following specifications:
- **Layers:** 12 transformer blocks
- **Attention Heads:** 12
- **Embedding Dimension:** 936
- **Context Window (Block Size):** 1024 tokens
- **Vocabulary Size:** 65 characters
- **Dropout:** 0.1
- **Parameters:** ~85M
### Architecture Components
- **Causal Self-Attention:** Multi-head attention with causal masking
- **Feed-Forward Network (MLP):** Two-layer MLP with GELU activation
- **Layer Normalization:** Pre-norm architecture
- **Residual Connections:** Skip connections around attention and MLP
## ๐Ÿ“ Project Structure
```
app/
โ”œโ”€โ”€ app.py # Main Gradio application
โ”œโ”€โ”€ requirementx.txt # Python dependencies
โ”œโ”€โ”€ models/
โ”‚ โ””โ”€โ”€ model_gpt2-124m.pth # Trained model weights
โ”œโ”€โ”€ train/
โ”‚ โ””โ”€โ”€ GPT_2_124M_Model_From_Scratch.ipynb # Training notebook
โ””โ”€โ”€ README.md # This file
```
## ๐Ÿš€ Installation
### Prerequisites
- Python 3.9 or higher
- pip (Python package manager)
### Setup
1. **Clone the repository** (or navigate to the project directory):
```bash
cd app
```
2. **Create a virtual environment** (recommended):
```bash
python -m venv venv
```
3. **Activate the virtual environment**:
- **Windows:**
```bash
venv\Scripts\activate
```
- **Linux/Mac:**
```bash
source venv/bin/activate
```
4. **Install dependencies**:
```bash
pip install -r requirementx.txt
```
Or manually install:
```bash
pip install torch gradio
```
## ๐ŸŽฏ Usage
### Running the Application
1. **Ensure the model file exists**:
- The trained model should be located at `models/model_gpt2-124m.pth`
- If not present, you'll need to train the model first (see Training section)
2. **Run the Gradio app**:
```bash
python app.py
```
3. **Access the web interface**:
- The app will start a local server
- Open your browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`)
### Using the Interface
1. **Enter a prompt** in the text box (e.g., "JULIET:" or "My Name is shivranjan")
2. **Adjust Max New Tokens** using the slider (50-1000 tokens, default: 300)
3. **Click Submit** or press Enter to generate text
4. **View the generated text** in the output box
### Example Prompts
- `JULIET:`
- `ROMEO:`
- `To be or not to be`
- `My Name is shivranjan`
## ๐ŸŽ“ Training
The model can be trained using the Jupyter notebook:
1. **Open the training notebook**:
- `train/GPT_2_124M_Model_From_Scratch.ipynb`
2. **Configure training parameters**:
- Set `CONFIG_TYPE = 'gpt2-124m'` for the full model
- Adjust hyperparameters as needed (learning rate, batch size, etc.)
3. **Provide training data**:
- The notebook expects `input.txt` with Shakespeare's works
- Update the `data_file` path in the notebook
4. **Run training**:
- Execute all cells in the notebook
- Training will save the model to `model_gpt2-124m.pth`
### Training Configuration
The model was trained with the following hyperparameters:
- **Block Size:** 1024
- **Batch Size:** 16
- **Learning Rate:** 1e-4
- **Max Iterations:** 5000
- **Evaluation Interval:** 100
- **Device:** CUDA (GPU recommended) or CPU
## ๐Ÿ”ง Technical Details
### Character Vocabulary
The model uses a 65-character vocabulary:
- Newline: `\n`
- Space: ` `
- Punctuation: `!`, `$`, `&`, `'`, `,`, `-`, `.`, `:`, `;`, `?`
- Numbers: `3`
- Letters: `A-Z`, `a-z`
### Tokenization
- **Encoding:** Character-level encoding (each character maps to an integer)
- **Decoding:** Integer-to-character mapping
- **Unknown Characters:** Characters not in the vocabulary are filtered out during encoding
### Generation Strategy
- **Method:** Autoregressive generation (greedy decoding)
- **Temperature:** N/A (uses argmax)
- **Context Window:** Up to 1024 characters
## ๐Ÿ“Š Performance Notes
- **CPU Inference:** Slower (may take 1-5 seconds per token)
- **GPU Inference:** Faster (recommended for better performance)
- **Generation Speed:** Depends on hardware and number of tokens
## ๐Ÿ› ๏ธ Dependencies
- **torch:** PyTorch for deep learning operations
- **gradio:** Web interface framework
- **Optional:** CUDA-enabled PyTorch for GPU acceleration
## ๐Ÿ“ Notes
- The model is trained specifically on Shakespeare's works
- Generated text may not always be coherent (depends on training quality)
- Character-level models are slower but provide fine-grained control
- The model weights are saved as a PyTorch state dictionary (`.pth` file)
## ๐Ÿ”ฎ Future Improvements
- Add sampling strategies (temperature, top-k, top-p)
- Implement beam search for better generation
- Add support for custom training data
- Optimize inference speed
- Add model fine-tuning capabilities
- Implement streaming generation for real-time output
## ๐Ÿ“„ License
This project is for educational purposes.
## ๐Ÿ‘ค Author
**Shivranjan Kolvankar**
---
## ๐Ÿ™ Acknowledgments
- Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) for architecture inspiration
- PyTorch team for the deep learning framework
- Gradio team for the web interface framework
- William Shakespeare for the training data
---
**Enjoy generating Shakespearean text! ๐ŸŽญ**