Spaces:

skolvankar
/

BaseGPT

Sleeping

App Files Files Community

BaseGPT / README.md

skolvankar

Upload README.md

355d257 verified 4 months ago

preview code

raw

history blame contribute delete

6.52 kB

	---
	title: Shakespeare GPT
	emoji: 🎭
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.1
	app_file: app.py
	pinned: false
	---

	# Shakespeare GPT 🎭

	A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio.

	Prepared by: Shivranjan Kolvankar

	## 📖 Overview

	This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English.

	## ✨ Features

	- From-scratch implementation of GPT architecture (no pre-trained weights)
	- Character-level tokenization (65-character vocabulary)
	- Gradio web interface for interactive text generation
	- Custom model architecture with configurable hyperparameters
	- Complete training pipeline with notebook-based training script

	## 🏗️ Model Architecture

	The model follows the GPT-2 architecture with the following specifications:

	- Layers: 12 transformer blocks
	- Attention Heads: 12
	- Embedding Dimension: 936
	- Context Window (Block Size): 1024 tokens
	- Vocabulary Size: 65 characters
	- Dropout: 0.1
	- Parameters: ~85M

	### Architecture Components

	- Causal Self-Attention: Multi-head attention with causal masking
	- Feed-Forward Network (MLP): Two-layer MLP with GELU activation
	- Layer Normalization: Pre-norm architecture
	- Residual Connections: Skip connections around attention and MLP

	## 📁 Project Structure

	```
	app/
	├── app.py # Main Gradio application
	├── requirementx.txt # Python dependencies
	├── models/
	│ └── model_gpt2-124m.pth # Trained model weights
	├── train/
	│ └── GPT_2_124M_Model_From_Scratch.ipynb # Training notebook
	└── README.md # This file
	```

	## 🚀 Installation

	### Prerequisites

	- Python 3.9 or higher
	- pip (Python package manager)

	### Setup

	1. Clone the repository (or navigate to the project directory):
	```bash
	cd app
	```

	2. Create a virtual environment (recommended):
	```bash
	python -m venv venv
	```

	3. Activate the virtual environment:
	- Windows:
	```bash
	venv\Scripts\activate
	```
	- Linux/Mac:
	```bash
	source venv/bin/activate
	```

	4. Install dependencies:
	```bash
	pip install -r requirementx.txt
	```

	Or manually install:
	```bash
	pip install torch gradio
	```

	## 🎯 Usage

	### Running the Application

	1. Ensure the model file exists:
	- The trained model should be located at `models/model_gpt2-124m.pth`
	- If not present, you'll need to train the model first (see Training section)

	2. Run the Gradio app:
	```bash
	python app.py
	```

	3. Access the web interface:
	- The app will start a local server
	- Open your browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`)

	### Using the Interface

	1. Enter a prompt in the text box (e.g., "JULIET:" or "My Name is shivranjan")
	2. Adjust Max New Tokens using the slider (50-1000 tokens, default: 300)
	3. Click Submit or press Enter to generate text
	4. View the generated text in the output box

	### Example Prompts

	- `JULIET:`
	- `ROMEO:`
	- `To be or not to be`
	- `My Name is shivranjan`

	## 🎓 Training

	The model can be trained using the Jupyter notebook:

	1. Open the training notebook:
	- `train/GPT_2_124M_Model_From_Scratch.ipynb`

	2. Configure training parameters:
	- Set `CONFIG_TYPE = 'gpt2-124m'` for the full model
	- Adjust hyperparameters as needed (learning rate, batch size, etc.)

	3. Provide training data:
	- The notebook expects `input.txt` with Shakespeare's works
	- Update the `data_file` path in the notebook

	4. Run training:
	- Execute all cells in the notebook
	- Training will save the model to `model_gpt2-124m.pth`

	### Training Configuration

	The model was trained with the following hyperparameters:

	- Block Size: 1024
	- Batch Size: 16
	- Learning Rate: 1e-4
	- Max Iterations: 5000
	- Evaluation Interval: 100
	- Device: CUDA (GPU recommended) or CPU

	## 🔧 Technical Details

	### Character Vocabulary

	The model uses a 65-character vocabulary:

	- Newline: `\n`
	- Space: ` `
	- Punctuation: `!`, `$`, `&`, `'`, `,`, `-`, `.`, `:`, `;`, `?`
	- Numbers: `3`
	- Letters: `A-Z`, `a-z`

	### Tokenization

	- Encoding: Character-level encoding (each character maps to an integer)
	- Decoding: Integer-to-character mapping
	- Unknown Characters: Characters not in the vocabulary are filtered out during encoding

	### Generation Strategy

	- Method: Autoregressive generation (greedy decoding)
	- Temperature: N/A (uses argmax)
	- Context Window: Up to 1024 characters

	## 📊 Performance Notes

	- CPU Inference: Slower (may take 1-5 seconds per token)
	- GPU Inference: Faster (recommended for better performance)
	- Generation Speed: Depends on hardware and number of tokens

	## 🛠️ Dependencies

	- torch: PyTorch for deep learning operations
	- gradio: Web interface framework
	- Optional: CUDA-enabled PyTorch for GPU acceleration

	## 📝 Notes

	- The model is trained specifically on Shakespeare's works
	- Generated text may not always be coherent (depends on training quality)
	- Character-level models are slower but provide fine-grained control
	- The model weights are saved as a PyTorch state dictionary (`.pth` file)

	## 🔮 Future Improvements

	- Add sampling strategies (temperature, top-k, top-p)
	- Implement beam search for better generation
	- Add support for custom training data
	- Optimize inference speed
	- Add model fine-tuning capabilities
	- Implement streaming generation for real-time output

	## 📄 License

	This project is for educational purposes.

	## 👤 Author

	Shivranjan Kolvankar

	---

	## 🙏 Acknowledgments

	- Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) for architecture inspiration
	- PyTorch team for the deep learning framework
	- Gradio team for the web interface framework
	- William Shakespeare for the training data

	---

	Enjoy generating Shakespearean text! 🎭