Spaces:

skolvankar
/

BaseGPT

Sleeping

App Files Files Community

BaseGPT / README.md

skolvankar

Upload README.md

355d257 verified 4 months ago

preview code

raw

history blame contribute delete

6.52 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: Shakespeare GPT
emoji: 🎭
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

Shakespeare GPT 🎭

A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio.

Prepared by: Shivranjan Kolvankar

📖 Overview

This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English.

✨ Features

From-scratch implementation of GPT architecture (no pre-trained weights)
Character-level tokenization (65-character vocabulary)
Gradio web interface for interactive text generation
Custom model architecture with configurable hyperparameters
Complete training pipeline with notebook-based training script

🏗️ Model Architecture

The model follows the GPT-2 architecture with the following specifications:

Layers: 12 transformer blocks
Attention Heads: 12
Embedding Dimension: 936
Context Window (Block Size): 1024 tokens
Vocabulary Size: 65 characters
Dropout: 0.1
Parameters: ~85M

Architecture Components

Causal Self-Attention: Multi-head attention with causal masking
Feed-Forward Network (MLP): Two-layer MLP with GELU activation
Layer Normalization: Pre-norm architecture
Residual Connections: Skip connections around attention and MLP

📁 Project Structure

app/
├── app.py                      # Main Gradio application
├── requirementx.txt            # Python dependencies
├── models/
│   └── model_gpt2-124m.pth    # Trained model weights
├── train/
│   └── GPT_2_124M_Model_From_Scratch.ipynb  # Training notebook
└── README.md                   # This file

🚀 Installation

Prerequisites

Python 3.9 or higher
pip (Python package manager)

Setup

Clone the repository (or navigate to the project directory):
```
cd app
```
Create a virtual environment (recommended):
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
venv\Scripts\activate
```
- Linux/Mac:
```
source venv/bin/activate
```

Install dependencies:

pip install -r requirementx.txt

Or manually install:

pip install torch gradio

🎯 Usage

Running the Application

Ensure the model file exists:
- The trained model should be located at models/model_gpt2-124m.pth
- If not present, you'll need to train the model first (see Training section)
Run the Gradio app:
```
python app.py
```
Access the web interface:
- The app will start a local server
- Open your browser and navigate to the URL shown in the terminal (typically http://127.0.0.1:7860)

Using the Interface

Enter a prompt in the text box (e.g., "JULIET:" or "My Name is shivranjan")
Adjust Max New Tokens using the slider (50-1000 tokens, default: 300)
Click Submit or press Enter to generate text
View the generated text in the output box

Example Prompts

JULIET:
ROMEO:
To be or not to be
My Name is shivranjan

🎓 Training

The model can be trained using the Jupyter notebook:

Open the training notebook:
- train/GPT_2_124M_Model_From_Scratch.ipynb
Configure training parameters:
- Set CONFIG_TYPE = 'gpt2-124m' for the full model
- Adjust hyperparameters as needed (learning rate, batch size, etc.)
Provide training data:
- The notebook expects input.txt with Shakespeare's works
- Update the data_file path in the notebook
Run training:
- Execute all cells in the notebook
- Training will save the model to model_gpt2-124m.pth

Training Configuration

The model was trained with the following hyperparameters:

Block Size: 1024
Batch Size: 16
Learning Rate: 1e-4
Max Iterations: 5000
Evaluation Interval: 100
Device: CUDA (GPU recommended) or CPU

🔧 Technical Details

Character Vocabulary

The model uses a 65-character vocabulary:

Newline: \n
Space:
Punctuation: !, $, &, ', ,, -, ., :, ;, ?
Numbers: 3
Letters: A-Z, a-z

Tokenization

Encoding: Character-level encoding (each character maps to an integer)
Decoding: Integer-to-character mapping
Unknown Characters: Characters not in the vocabulary are filtered out during encoding

Generation Strategy

Method: Autoregressive generation (greedy decoding)
Temperature: N/A (uses argmax)
Context Window: Up to 1024 characters

📊 Performance Notes

CPU Inference: Slower (may take 1-5 seconds per token)
GPU Inference: Faster (recommended for better performance)
Generation Speed: Depends on hardware and number of tokens

🛠️ Dependencies

torch: PyTorch for deep learning operations
gradio: Web interface framework
Optional: CUDA-enabled PyTorch for GPU acceleration

📝 Notes

The model is trained specifically on Shakespeare's works
Generated text may not always be coherent (depends on training quality)
Character-level models are slower but provide fine-grained control
The model weights are saved as a PyTorch state dictionary (.pth file)

🔮 Future Improvements

Add sampling strategies (temperature, top-k, top-p)
Implement beam search for better generation
Add support for custom training data
Optimize inference speed
Add model fine-tuning capabilities
Implement streaming generation for real-time output

📄 License

This project is for educational purposes.

👤 Author

Shivranjan Kolvankar

🙏 Acknowledgments

Andrej Karpathy's nanoGPT for architecture inspiration
PyTorch team for the deep learning framework
Gradio team for the web interface framework
William Shakespeare for the training data

Enjoy generating Shakespearean text! 🎭