Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
title: Shakespeare GPT
emoji: ๐ญ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
Shakespeare GPT ๐ญ
A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio.
Prepared by: Shivranjan Kolvankar
๐ Overview
This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English.
โจ Features
- From-scratch implementation of GPT architecture (no pre-trained weights)
- Character-level tokenization (65-character vocabulary)
- Gradio web interface for interactive text generation
- Custom model architecture with configurable hyperparameters
- Complete training pipeline with notebook-based training script
๐๏ธ Model Architecture
The model follows the GPT-2 architecture with the following specifications:
- Layers: 12 transformer blocks
- Attention Heads: 12
- Embedding Dimension: 936
- Context Window (Block Size): 1024 tokens
- Vocabulary Size: 65 characters
- Dropout: 0.1
- Parameters: ~85M
Architecture Components
- Causal Self-Attention: Multi-head attention with causal masking
- Feed-Forward Network (MLP): Two-layer MLP with GELU activation
- Layer Normalization: Pre-norm architecture
- Residual Connections: Skip connections around attention and MLP
๐ Project Structure
app/
โโโ app.py # Main Gradio application
โโโ requirementx.txt # Python dependencies
โโโ models/
โ โโโ model_gpt2-124m.pth # Trained model weights
โโโ train/
โ โโโ GPT_2_124M_Model_From_Scratch.ipynb # Training notebook
โโโ README.md # This file
๐ Installation
Prerequisites
- Python 3.9 or higher
- pip (Python package manager)
Setup
Clone the repository (or navigate to the project directory):
cd appCreate a virtual environment (recommended):
python -m venv venvActivate the virtual environment:
- Windows:
venv\Scripts\activate - Linux/Mac:
source venv/bin/activate
- Windows:
Install dependencies:
pip install -r requirementx.txtOr manually install:
pip install torch gradio
๐ฏ Usage
Running the Application
Ensure the model file exists:
- The trained model should be located at
models/model_gpt2-124m.pth - If not present, you'll need to train the model first (see Training section)
- The trained model should be located at
Run the Gradio app:
python app.pyAccess the web interface:
- The app will start a local server
- Open your browser and navigate to the URL shown in the terminal (typically
http://127.0.0.1:7860)
Using the Interface
- Enter a prompt in the text box (e.g., "JULIET:" or "My Name is shivranjan")
- Adjust Max New Tokens using the slider (50-1000 tokens, default: 300)
- Click Submit or press Enter to generate text
- View the generated text in the output box
Example Prompts
JULIET:ROMEO:To be or not to beMy Name is shivranjan
๐ Training
The model can be trained using the Jupyter notebook:
Open the training notebook:
train/GPT_2_124M_Model_From_Scratch.ipynb
Configure training parameters:
- Set
CONFIG_TYPE = 'gpt2-124m'for the full model - Adjust hyperparameters as needed (learning rate, batch size, etc.)
- Set
Provide training data:
- The notebook expects
input.txtwith Shakespeare's works - Update the
data_filepath in the notebook
- The notebook expects
Run training:
- Execute all cells in the notebook
- Training will save the model to
model_gpt2-124m.pth
Training Configuration
The model was trained with the following hyperparameters:
- Block Size: 1024
- Batch Size: 16
- Learning Rate: 1e-4
- Max Iterations: 5000
- Evaluation Interval: 100
- Device: CUDA (GPU recommended) or CPU
๐ง Technical Details
Character Vocabulary
The model uses a 65-character vocabulary:
- Newline:
\n - Space:
- Punctuation:
!,$,&,',,,-,.,:,;,? - Numbers:
3 - Letters:
A-Z,a-z
Tokenization
- Encoding: Character-level encoding (each character maps to an integer)
- Decoding: Integer-to-character mapping
- Unknown Characters: Characters not in the vocabulary are filtered out during encoding
Generation Strategy
- Method: Autoregressive generation (greedy decoding)
- Temperature: N/A (uses argmax)
- Context Window: Up to 1024 characters
๐ Performance Notes
- CPU Inference: Slower (may take 1-5 seconds per token)
- GPU Inference: Faster (recommended for better performance)
- Generation Speed: Depends on hardware and number of tokens
๐ ๏ธ Dependencies
- torch: PyTorch for deep learning operations
- gradio: Web interface framework
- Optional: CUDA-enabled PyTorch for GPU acceleration
๐ Notes
- The model is trained specifically on Shakespeare's works
- Generated text may not always be coherent (depends on training quality)
- Character-level models are slower but provide fine-grained control
- The model weights are saved as a PyTorch state dictionary (
.pthfile)
๐ฎ Future Improvements
- Add sampling strategies (temperature, top-k, top-p)
- Implement beam search for better generation
- Add support for custom training data
- Optimize inference speed
- Add model fine-tuning capabilities
- Implement streaming generation for real-time output
๐ License
This project is for educational purposes.
๐ค Author
Shivranjan Kolvankar
๐ Acknowledgments
- Andrej Karpathy's nanoGPT for architecture inspiration
- PyTorch team for the deep learning framework
- Gradio team for the web interface framework
- William Shakespeare for the training data
Enjoy generating Shakespearean text! ๐ญ