BaseGPT / README.md
skolvankar's picture
Upload README.md
355d257 verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: Shakespeare GPT
emoji: ๐ŸŽญ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

Shakespeare GPT ๐ŸŽญ

A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio.

Prepared by: Shivranjan Kolvankar

๐Ÿ“– Overview

This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English.

โœจ Features

  • From-scratch implementation of GPT architecture (no pre-trained weights)
  • Character-level tokenization (65-character vocabulary)
  • Gradio web interface for interactive text generation
  • Custom model architecture with configurable hyperparameters
  • Complete training pipeline with notebook-based training script

๐Ÿ—๏ธ Model Architecture

The model follows the GPT-2 architecture with the following specifications:

  • Layers: 12 transformer blocks
  • Attention Heads: 12
  • Embedding Dimension: 936
  • Context Window (Block Size): 1024 tokens
  • Vocabulary Size: 65 characters
  • Dropout: 0.1
  • Parameters: ~85M

Architecture Components

  • Causal Self-Attention: Multi-head attention with causal masking
  • Feed-Forward Network (MLP): Two-layer MLP with GELU activation
  • Layer Normalization: Pre-norm architecture
  • Residual Connections: Skip connections around attention and MLP

๐Ÿ“ Project Structure

app/
โ”œโ”€โ”€ app.py                      # Main Gradio application
โ”œโ”€โ”€ requirementx.txt            # Python dependencies
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ model_gpt2-124m.pth    # Trained model weights
โ”œโ”€โ”€ train/
โ”‚   โ””โ”€โ”€ GPT_2_124M_Model_From_Scratch.ipynb  # Training notebook
โ””โ”€โ”€ README.md                   # This file

๐Ÿš€ Installation

Prerequisites

  • Python 3.9 or higher
  • pip (Python package manager)

Setup

  1. Clone the repository (or navigate to the project directory):

    cd app
    
  2. Create a virtual environment (recommended):

    python -m venv venv
    
  3. Activate the virtual environment:

    • Windows:
      venv\Scripts\activate
      
    • Linux/Mac:
      source venv/bin/activate
      
  4. Install dependencies:

    pip install -r requirementx.txt
    

    Or manually install:

    pip install torch gradio
    

๐ŸŽฏ Usage

Running the Application

  1. Ensure the model file exists:

    • The trained model should be located at models/model_gpt2-124m.pth
    • If not present, you'll need to train the model first (see Training section)
  2. Run the Gradio app:

    python app.py
    
  3. Access the web interface:

    • The app will start a local server
    • Open your browser and navigate to the URL shown in the terminal (typically http://127.0.0.1:7860)

Using the Interface

  1. Enter a prompt in the text box (e.g., "JULIET:" or "My Name is shivranjan")
  2. Adjust Max New Tokens using the slider (50-1000 tokens, default: 300)
  3. Click Submit or press Enter to generate text
  4. View the generated text in the output box

Example Prompts

  • JULIET:
  • ROMEO:
  • To be or not to be
  • My Name is shivranjan

๐ŸŽ“ Training

The model can be trained using the Jupyter notebook:

  1. Open the training notebook:

    • train/GPT_2_124M_Model_From_Scratch.ipynb
  2. Configure training parameters:

    • Set CONFIG_TYPE = 'gpt2-124m' for the full model
    • Adjust hyperparameters as needed (learning rate, batch size, etc.)
  3. Provide training data:

    • The notebook expects input.txt with Shakespeare's works
    • Update the data_file path in the notebook
  4. Run training:

    • Execute all cells in the notebook
    • Training will save the model to model_gpt2-124m.pth

Training Configuration

The model was trained with the following hyperparameters:

  • Block Size: 1024
  • Batch Size: 16
  • Learning Rate: 1e-4
  • Max Iterations: 5000
  • Evaluation Interval: 100
  • Device: CUDA (GPU recommended) or CPU

๐Ÿ”ง Technical Details

Character Vocabulary

The model uses a 65-character vocabulary:

  • Newline: \n
  • Space:
  • Punctuation: !, $, &, ', ,, -, ., :, ;, ?
  • Numbers: 3
  • Letters: A-Z, a-z

Tokenization

  • Encoding: Character-level encoding (each character maps to an integer)
  • Decoding: Integer-to-character mapping
  • Unknown Characters: Characters not in the vocabulary are filtered out during encoding

Generation Strategy

  • Method: Autoregressive generation (greedy decoding)
  • Temperature: N/A (uses argmax)
  • Context Window: Up to 1024 characters

๐Ÿ“Š Performance Notes

  • CPU Inference: Slower (may take 1-5 seconds per token)
  • GPU Inference: Faster (recommended for better performance)
  • Generation Speed: Depends on hardware and number of tokens

๐Ÿ› ๏ธ Dependencies

  • torch: PyTorch for deep learning operations
  • gradio: Web interface framework
  • Optional: CUDA-enabled PyTorch for GPU acceleration

๐Ÿ“ Notes

  • The model is trained specifically on Shakespeare's works
  • Generated text may not always be coherent (depends on training quality)
  • Character-level models are slower but provide fine-grained control
  • The model weights are saved as a PyTorch state dictionary (.pth file)

๐Ÿ”ฎ Future Improvements

  • Add sampling strategies (temperature, top-k, top-p)
  • Implement beam search for better generation
  • Add support for custom training data
  • Optimize inference speed
  • Add model fine-tuning capabilities
  • Implement streaming generation for real-time output

๐Ÿ“„ License

This project is for educational purposes.

๐Ÿ‘ค Author

Shivranjan Kolvankar


๐Ÿ™ Acknowledgments

  • Andrej Karpathy's nanoGPT for architecture inspiration
  • PyTorch team for the deep learning framework
  • Gradio team for the web interface framework
  • William Shakespeare for the training data

Enjoy generating Shakespearean text! ๐ŸŽญ