--- title: Shakespeare GPT emoji: 🎭 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.1 app_file: app.py pinned: false --- # Shakespeare GPT 🎭 A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio. **Prepared by:** Shivranjan Kolvankar ## 📖 Overview This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English. ## ✨ Features - **From-scratch implementation** of GPT architecture (no pre-trained weights) - **Character-level tokenization** (65-character vocabulary) - **Gradio web interface** for interactive text generation - **Custom model architecture** with configurable hyperparameters - **Complete training pipeline** with notebook-based training script ## 🏗️ Model Architecture The model follows the GPT-2 architecture with the following specifications: - **Layers:** 12 transformer blocks - **Attention Heads:** 12 - **Embedding Dimension:** 936 - **Context Window (Block Size):** 1024 tokens - **Vocabulary Size:** 65 characters - **Dropout:** 0.1 - **Parameters:** ~85M ### Architecture Components - **Causal Self-Attention:** Multi-head attention with causal masking - **Feed-Forward Network (MLP):** Two-layer MLP with GELU activation - **Layer Normalization:** Pre-norm architecture - **Residual Connections:** Skip connections around attention and MLP ## 📁 Project Structure ``` app/ ├── app.py # Main Gradio application ├── requirementx.txt # Python dependencies ├── models/ │ └── model_gpt2-124m.pth # Trained model weights ├── train/ │ └── GPT_2_124M_Model_From_Scratch.ipynb # Training notebook └── README.md # This file ``` ## 🚀 Installation ### Prerequisites - Python 3.9 or higher - pip (Python package manager) ### Setup 1. **Clone the repository** (or navigate to the project directory): ```bash cd app ``` 2. **Create a virtual environment** (recommended): ```bash python -m venv venv ``` 3. **Activate the virtual environment**: - **Windows:** ```bash venv\Scripts\activate ``` - **Linux/Mac:** ```bash source venv/bin/activate ``` 4. **Install dependencies**: ```bash pip install -r requirementx.txt ``` Or manually install: ```bash pip install torch gradio ``` ## 🎯 Usage ### Running the Application 1. **Ensure the model file exists**: - The trained model should be located at `models/model_gpt2-124m.pth` - If not present, you'll need to train the model first (see Training section) 2. **Run the Gradio app**: ```bash python app.py ``` 3. **Access the web interface**: - The app will start a local server - Open your browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`) ### Using the Interface 1. **Enter a prompt** in the text box (e.g., "JULIET:" or "My Name is shivranjan") 2. **Adjust Max New Tokens** using the slider (50-1000 tokens, default: 300) 3. **Click Submit** or press Enter to generate text 4. **View the generated text** in the output box ### Example Prompts - `JULIET:` - `ROMEO:` - `To be or not to be` - `My Name is shivranjan` ## 🎓 Training The model can be trained using the Jupyter notebook: 1. **Open the training notebook**: - `train/GPT_2_124M_Model_From_Scratch.ipynb` 2. **Configure training parameters**: - Set `CONFIG_TYPE = 'gpt2-124m'` for the full model - Adjust hyperparameters as needed (learning rate, batch size, etc.) 3. **Provide training data**: - The notebook expects `input.txt` with Shakespeare's works - Update the `data_file` path in the notebook 4. **Run training**: - Execute all cells in the notebook - Training will save the model to `model_gpt2-124m.pth` ### Training Configuration The model was trained with the following hyperparameters: - **Block Size:** 1024 - **Batch Size:** 16 - **Learning Rate:** 1e-4 - **Max Iterations:** 5000 - **Evaluation Interval:** 100 - **Device:** CUDA (GPU recommended) or CPU ## 🔧 Technical Details ### Character Vocabulary The model uses a 65-character vocabulary: - Newline: `\n` - Space: ` ` - Punctuation: `!`, `$`, `&`, `'`, `,`, `-`, `.`, `:`, `;`, `?` - Numbers: `3` - Letters: `A-Z`, `a-z` ### Tokenization - **Encoding:** Character-level encoding (each character maps to an integer) - **Decoding:** Integer-to-character mapping - **Unknown Characters:** Characters not in the vocabulary are filtered out during encoding ### Generation Strategy - **Method:** Autoregressive generation (greedy decoding) - **Temperature:** N/A (uses argmax) - **Context Window:** Up to 1024 characters ## 📊 Performance Notes - **CPU Inference:** Slower (may take 1-5 seconds per token) - **GPU Inference:** Faster (recommended for better performance) - **Generation Speed:** Depends on hardware and number of tokens ## 🛠️ Dependencies - **torch:** PyTorch for deep learning operations - **gradio:** Web interface framework - **Optional:** CUDA-enabled PyTorch for GPU acceleration ## 📝 Notes - The model is trained specifically on Shakespeare's works - Generated text may not always be coherent (depends on training quality) - Character-level models are slower but provide fine-grained control - The model weights are saved as a PyTorch state dictionary (`.pth` file) ## 🔮 Future Improvements - Add sampling strategies (temperature, top-k, top-p) - Implement beam search for better generation - Add support for custom training data - Optimize inference speed - Add model fine-tuning capabilities - Implement streaming generation for real-time output ## 📄 License This project is for educational purposes. ## 👤 Author **Shivranjan Kolvankar** --- ## 🙏 Acknowledgments - Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) for architecture inspiration - PyTorch team for the deep learning framework - Gradio team for the web interface framework - William Shakespeare for the training data --- **Enjoy generating Shakespearean text! 🎭**