Spaces:
Sleeping
Sleeping
| title: Shakespeare GPT | |
| emoji: ๐ญ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.1 | |
| app_file: app.py | |
| pinned: false | |
| # Shakespeare GPT ๐ญ | |
| A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio. | |
| **Prepared by:** Shivranjan Kolvankar | |
| ## ๐ Overview | |
| This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English. | |
| ## โจ Features | |
| - **From-scratch implementation** of GPT architecture (no pre-trained weights) | |
| - **Character-level tokenization** (65-character vocabulary) | |
| - **Gradio web interface** for interactive text generation | |
| - **Custom model architecture** with configurable hyperparameters | |
| - **Complete training pipeline** with notebook-based training script | |
| ## ๐๏ธ Model Architecture | |
| The model follows the GPT-2 architecture with the following specifications: | |
| - **Layers:** 12 transformer blocks | |
| - **Attention Heads:** 12 | |
| - **Embedding Dimension:** 936 | |
| - **Context Window (Block Size):** 1024 tokens | |
| - **Vocabulary Size:** 65 characters | |
| - **Dropout:** 0.1 | |
| - **Parameters:** ~85M | |
| ### Architecture Components | |
| - **Causal Self-Attention:** Multi-head attention with causal masking | |
| - **Feed-Forward Network (MLP):** Two-layer MLP with GELU activation | |
| - **Layer Normalization:** Pre-norm architecture | |
| - **Residual Connections:** Skip connections around attention and MLP | |
| ## ๐ Project Structure | |
| ``` | |
| app/ | |
| โโโ app.py # Main Gradio application | |
| โโโ requirementx.txt # Python dependencies | |
| โโโ models/ | |
| โ โโโ model_gpt2-124m.pth # Trained model weights | |
| โโโ train/ | |
| โ โโโ GPT_2_124M_Model_From_Scratch.ipynb # Training notebook | |
| โโโ README.md # This file | |
| ``` | |
| ## ๐ Installation | |
| ### Prerequisites | |
| - Python 3.9 or higher | |
| - pip (Python package manager) | |
| ### Setup | |
| 1. **Clone the repository** (or navigate to the project directory): | |
| ```bash | |
| cd app | |
| ``` | |
| 2. **Create a virtual environment** (recommended): | |
| ```bash | |
| python -m venv venv | |
| ``` | |
| 3. **Activate the virtual environment**: | |
| - **Windows:** | |
| ```bash | |
| venv\Scripts\activate | |
| ``` | |
| - **Linux/Mac:** | |
| ```bash | |
| source venv/bin/activate | |
| ``` | |
| 4. **Install dependencies**: | |
| ```bash | |
| pip install -r requirementx.txt | |
| ``` | |
| Or manually install: | |
| ```bash | |
| pip install torch gradio | |
| ``` | |
| ## ๐ฏ Usage | |
| ### Running the Application | |
| 1. **Ensure the model file exists**: | |
| - The trained model should be located at `models/model_gpt2-124m.pth` | |
| - If not present, you'll need to train the model first (see Training section) | |
| 2. **Run the Gradio app**: | |
| ```bash | |
| python app.py | |
| ``` | |
| 3. **Access the web interface**: | |
| - The app will start a local server | |
| - Open your browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`) | |
| ### Using the Interface | |
| 1. **Enter a prompt** in the text box (e.g., "JULIET:" or "My Name is shivranjan") | |
| 2. **Adjust Max New Tokens** using the slider (50-1000 tokens, default: 300) | |
| 3. **Click Submit** or press Enter to generate text | |
| 4. **View the generated text** in the output box | |
| ### Example Prompts | |
| - `JULIET:` | |
| - `ROMEO:` | |
| - `To be or not to be` | |
| - `My Name is shivranjan` | |
| ## ๐ Training | |
| The model can be trained using the Jupyter notebook: | |
| 1. **Open the training notebook**: | |
| - `train/GPT_2_124M_Model_From_Scratch.ipynb` | |
| 2. **Configure training parameters**: | |
| - Set `CONFIG_TYPE = 'gpt2-124m'` for the full model | |
| - Adjust hyperparameters as needed (learning rate, batch size, etc.) | |
| 3. **Provide training data**: | |
| - The notebook expects `input.txt` with Shakespeare's works | |
| - Update the `data_file` path in the notebook | |
| 4. **Run training**: | |
| - Execute all cells in the notebook | |
| - Training will save the model to `model_gpt2-124m.pth` | |
| ### Training Configuration | |
| The model was trained with the following hyperparameters: | |
| - **Block Size:** 1024 | |
| - **Batch Size:** 16 | |
| - **Learning Rate:** 1e-4 | |
| - **Max Iterations:** 5000 | |
| - **Evaluation Interval:** 100 | |
| - **Device:** CUDA (GPU recommended) or CPU | |
| ## ๐ง Technical Details | |
| ### Character Vocabulary | |
| The model uses a 65-character vocabulary: | |
| - Newline: `\n` | |
| - Space: ` ` | |
| - Punctuation: `!`, `$`, `&`, `'`, `,`, `-`, `.`, `:`, `;`, `?` | |
| - Numbers: `3` | |
| - Letters: `A-Z`, `a-z` | |
| ### Tokenization | |
| - **Encoding:** Character-level encoding (each character maps to an integer) | |
| - **Decoding:** Integer-to-character mapping | |
| - **Unknown Characters:** Characters not in the vocabulary are filtered out during encoding | |
| ### Generation Strategy | |
| - **Method:** Autoregressive generation (greedy decoding) | |
| - **Temperature:** N/A (uses argmax) | |
| - **Context Window:** Up to 1024 characters | |
| ## ๐ Performance Notes | |
| - **CPU Inference:** Slower (may take 1-5 seconds per token) | |
| - **GPU Inference:** Faster (recommended for better performance) | |
| - **Generation Speed:** Depends on hardware and number of tokens | |
| ## ๐ ๏ธ Dependencies | |
| - **torch:** PyTorch for deep learning operations | |
| - **gradio:** Web interface framework | |
| - **Optional:** CUDA-enabled PyTorch for GPU acceleration | |
| ## ๐ Notes | |
| - The model is trained specifically on Shakespeare's works | |
| - Generated text may not always be coherent (depends on training quality) | |
| - Character-level models are slower but provide fine-grained control | |
| - The model weights are saved as a PyTorch state dictionary (`.pth` file) | |
| ## ๐ฎ Future Improvements | |
| - Add sampling strategies (temperature, top-k, top-p) | |
| - Implement beam search for better generation | |
| - Add support for custom training data | |
| - Optimize inference speed | |
| - Add model fine-tuning capabilities | |
| - Implement streaming generation for real-time output | |
| ## ๐ License | |
| This project is for educational purposes. | |
| ## ๐ค Author | |
| **Shivranjan Kolvankar** | |
| --- | |
| ## ๐ Acknowledgments | |
| - Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) for architecture inspiration | |
| - PyTorch team for the deep learning framework | |
| - Gradio team for the web interface framework | |
| - William Shakespeare for the training data | |
| --- | |
| **Enjoy generating Shakespearean text! ๐ญ** | |