---
title: Shakespeare GPT
emoji: 🎭
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
---

# Shakespeare GPT 🎭

A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio.

**Prepared by:** Shivranjan Kolvankar

## 📖 Overview

This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English.

## ✨ Features

- **From-scratch implementation** of GPT architecture (no pre-trained weights)
- **Character-level tokenization** (65-character vocabulary)
- **Gradio web interface** for interactive text generation
- **Custom model architecture** with configurable hyperparameters
- **Complete training pipeline** with notebook-based training script

## 🏗️ Model Architecture

The model follows the GPT-2 architecture with the following specifications:

- **Layers:** 12 transformer blocks
- **Attention Heads:** 12
- **Embedding Dimension:** 936
- **Context Window (Block Size):** 1024 tokens
- **Vocabulary Size:** 65 characters
- **Dropout:** 0.1
- **Parameters:** ~85M

### Architecture Components

- **Causal Self-Attention:** Multi-head attention with causal masking
- **Feed-Forward Network (MLP):** Two-layer MLP with GELU activation
- **Layer Normalization:** Pre-norm architecture
- **Residual Connections:** Skip connections around attention and MLP

## 📁 Project Structure

```
app/
├── app.py                      # Main Gradio application
├── requirementx.txt            # Python dependencies
├── models/
│   └── model_gpt2-124m.pth    # Trained model weights
├── train/
│   └── GPT_2_124M_Model_From_Scratch.ipynb  # Training notebook
└── README.md                   # This file
```

## 🚀 Installation

### Prerequisites

- Python 3.9 or higher
- pip (Python package manager)

### Setup

1. **Clone the repository** (or navigate to the project directory):
   ```bash
   cd app
   ```

2. **Create a virtual environment** (recommended):
   ```bash
   python -m venv venv
   ```

3. **Activate the virtual environment**:
   - **Windows:**
     ```bash
     venv\Scripts\activate
     ```
   - **Linux/Mac:**
     ```bash
     source venv/bin/activate
     ```

4. **Install dependencies**:
   ```bash
   pip install -r requirementx.txt
   ```

   Or manually install:
   ```bash
   pip install torch gradio
   ```

## 🎯 Usage

### Running the Application

1. **Ensure the model file exists**:
   - The trained model should be located at `models/model_gpt2-124m.pth`
   - If not present, you'll need to train the model first (see Training section)

2. **Run the Gradio app**:
   ```bash
   python app.py
   ```

3. **Access the web interface**:
   - The app will start a local server
   - Open your browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`)

### Using the Interface

1. **Enter a prompt** in the text box (e.g., "JULIET:" or "My Name is shivranjan")
2. **Adjust Max New Tokens** using the slider (50-1000 tokens, default: 300)
3. **Click Submit** or press Enter to generate text
4. **View the generated text** in the output box

### Example Prompts

- `JULIET:`
- `ROMEO:`
- `To be or not to be`
- `My Name is shivranjan`

## 🎓 Training

The model can be trained using the Jupyter notebook:

1. **Open the training notebook**:
   - `train/GPT_2_124M_Model_From_Scratch.ipynb`

2. **Configure training parameters**:
   - Set `CONFIG_TYPE = 'gpt2-124m'` for the full model
   - Adjust hyperparameters as needed (learning rate, batch size, etc.)

3. **Provide training data**:
   - The notebook expects `input.txt` with Shakespeare's works
   - Update the `data_file` path in the notebook

4. **Run training**:
   - Execute all cells in the notebook
   - Training will save the model to `model_gpt2-124m.pth`

### Training Configuration

The model was trained with the following hyperparameters:

- **Block Size:** 1024
- **Batch Size:** 16
- **Learning Rate:** 1e-4
- **Max Iterations:** 5000
- **Evaluation Interval:** 100
- **Device:** CUDA (GPU recommended) or CPU

## 🔧 Technical Details

### Character Vocabulary

The model uses a 65-character vocabulary:

- Newline: `\n`
- Space: ` `
- Punctuation: `!`, `$`, `&`, `'`, `,`, `-`, `.`, `:`, `;`, `?`
- Numbers: `3`
- Letters: `A-Z`, `a-z`

### Tokenization

- **Encoding:** Character-level encoding (each character maps to an integer)
- **Decoding:** Integer-to-character mapping
- **Unknown Characters:** Characters not in the vocabulary are filtered out during encoding

### Generation Strategy

- **Method:** Autoregressive generation (greedy decoding)
- **Temperature:** N/A (uses argmax)
- **Context Window:** Up to 1024 characters

## 📊 Performance Notes

- **CPU Inference:** Slower (may take 1-5 seconds per token)
- **GPU Inference:** Faster (recommended for better performance)
- **Generation Speed:** Depends on hardware and number of tokens

## 🛠️ Dependencies

- **torch:** PyTorch for deep learning operations
- **gradio:** Web interface framework
- **Optional:** CUDA-enabled PyTorch for GPU acceleration

## 📝 Notes

- The model is trained specifically on Shakespeare's works
- Generated text may not always be coherent (depends on training quality)
- Character-level models are slower but provide fine-grained control
- The model weights are saved as a PyTorch state dictionary (`.pth` file)

## 🔮 Future Improvements

- Add sampling strategies (temperature, top-k, top-p)
- Implement beam search for better generation
- Add support for custom training data
- Optimize inference speed
- Add model fine-tuning capabilities
- Implement streaming generation for real-time output

## 📄 License

This project is for educational purposes.

## 👤 Author

**Shivranjan Kolvankar**

---

## 🙏 Acknowledgments

- Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) for architecture inspiration
- PyTorch team for the deep learning framework
- Gradio team for the web interface framework
- William Shakespeare for the training data

---

**Enjoy generating Shakespearean text! 🎭**