File size: 6,524 Bytes
355d257
 
 
 
 
 
 
 
 
 
 
9a1472c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---

title: Shakespeare GPT
emoji: ๐ŸŽญ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
---


# Shakespeare GPT ๐ŸŽญ

A character-level GPT model trained from scratch on Shakespeare's works, implemented using PyTorch and served via Gradio.

**Prepared by:** Shivranjan Kolvankar

## ๐Ÿ“– Overview

This project implements a Generative Pre-trained Transformer (GPT) model from scratch, trained on Shakespeare's complete works. The model generates text character-by-character, maintaining the style and vocabulary of Shakespearean English.

## โœจ Features

- **From-scratch implementation** of GPT architecture (no pre-trained weights)
- **Character-level tokenization** (65-character vocabulary)
- **Gradio web interface** for interactive text generation
- **Custom model architecture** with configurable hyperparameters
- **Complete training pipeline** with notebook-based training script

## ๐Ÿ—๏ธ Model Architecture

The model follows the GPT-2 architecture with the following specifications:

- **Layers:** 12 transformer blocks
- **Attention Heads:** 12
- **Embedding Dimension:** 936
- **Context Window (Block Size):** 1024 tokens
- **Vocabulary Size:** 65 characters
- **Dropout:** 0.1
- **Parameters:** ~85M

### Architecture Components

- **Causal Self-Attention:** Multi-head attention with causal masking
- **Feed-Forward Network (MLP):** Two-layer MLP with GELU activation
- **Layer Normalization:** Pre-norm architecture
- **Residual Connections:** Skip connections around attention and MLP

## ๐Ÿ“ Project Structure

```

app/

โ”œโ”€โ”€ app.py                      # Main Gradio application

โ”œโ”€โ”€ requirementx.txt            # Python dependencies

โ”œโ”€โ”€ models/

โ”‚   โ””โ”€โ”€ model_gpt2-124m.pth    # Trained model weights

โ”œโ”€โ”€ train/

โ”‚   โ””โ”€โ”€ GPT_2_124M_Model_From_Scratch.ipynb  # Training notebook

โ””โ”€โ”€ README.md                   # This file

```

## ๐Ÿš€ Installation

### Prerequisites

- Python 3.9 or higher
- pip (Python package manager)

### Setup

1. **Clone the repository** (or navigate to the project directory):
   ```bash

   cd app

   ```

2. **Create a virtual environment** (recommended):
   ```bash

   python -m venv venv

   ```

3. **Activate the virtual environment**:
   - **Windows:**
     ```bash

     venv\Scripts\activate

     ```

   - **Linux/Mac:**

     ```bash

     source venv/bin/activate

     ```


4. **Install dependencies**:
   ```bash

   pip install -r requirementx.txt

   ```

   Or manually install:
   ```bash

   pip install torch gradio

   ```

## ๐ŸŽฏ Usage

### Running the Application

1. **Ensure the model file exists**:
   - The trained model should be located at `models/model_gpt2-124m.pth`
   - If not present, you'll need to train the model first (see Training section)

2. **Run the Gradio app**:
   ```bash

   python app.py

   ```

3. **Access the web interface**:
   - The app will start a local server
   - Open your browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`)

### Using the Interface

1. **Enter a prompt** in the text box (e.g., "JULIET:" or "My Name is shivranjan")
2. **Adjust Max New Tokens** using the slider (50-1000 tokens, default: 300)
3. **Click Submit** or press Enter to generate text
4. **View the generated text** in the output box

### Example Prompts

- `JULIET:`
- `ROMEO:`
- `To be or not to be`
- `My Name is shivranjan`

## ๐ŸŽ“ Training

The model can be trained using the Jupyter notebook:

1. **Open the training notebook**:
   - `train/GPT_2_124M_Model_From_Scratch.ipynb`

2. **Configure training parameters**:
   - Set `CONFIG_TYPE = 'gpt2-124m'` for the full model
   - Adjust hyperparameters as needed (learning rate, batch size, etc.)

3. **Provide training data**:
   - The notebook expects `input.txt` with Shakespeare's works
   - Update the `data_file` path in the notebook

4. **Run training**:
   - Execute all cells in the notebook
   - Training will save the model to `model_gpt2-124m.pth`

### Training Configuration

The model was trained with the following hyperparameters:

- **Block Size:** 1024
- **Batch Size:** 16
- **Learning Rate:** 1e-4
- **Max Iterations:** 5000
- **Evaluation Interval:** 100
- **Device:** CUDA (GPU recommended) or CPU

## ๐Ÿ”ง Technical Details

### Character Vocabulary

The model uses a 65-character vocabulary:

- Newline: `\n`
- Space: ` `
- Punctuation: `!`, `$`, `&`, `'`, `,`, `-`, `.`, `:`, `;`, `?`
- Numbers: `3`
- Letters: `A-Z`, `a-z`

### Tokenization

- **Encoding:** Character-level encoding (each character maps to an integer)
- **Decoding:** Integer-to-character mapping
- **Unknown Characters:** Characters not in the vocabulary are filtered out during encoding

### Generation Strategy

- **Method:** Autoregressive generation (greedy decoding)
- **Temperature:** N/A (uses argmax)
- **Context Window:** Up to 1024 characters

## ๐Ÿ“Š Performance Notes

- **CPU Inference:** Slower (may take 1-5 seconds per token)
- **GPU Inference:** Faster (recommended for better performance)
- **Generation Speed:** Depends on hardware and number of tokens

## ๐Ÿ› ๏ธ Dependencies

- **torch:** PyTorch for deep learning operations
- **gradio:** Web interface framework
- **Optional:** CUDA-enabled PyTorch for GPU acceleration

## ๐Ÿ“ Notes

- The model is trained specifically on Shakespeare's works
- Generated text may not always be coherent (depends on training quality)
- Character-level models are slower but provide fine-grained control
- The model weights are saved as a PyTorch state dictionary (`.pth` file)

## ๐Ÿ”ฎ Future Improvements

- Add sampling strategies (temperature, top-k, top-p)
- Implement beam search for better generation
- Add support for custom training data
- Optimize inference speed
- Add model fine-tuning capabilities
- Implement streaming generation for real-time output

## ๐Ÿ“„ License

This project is for educational purposes.

## ๐Ÿ‘ค Author

**Shivranjan Kolvankar**

---

## ๐Ÿ™ Acknowledgments

- Andrej Karpathy's [nanoGPT](https://github.com/karpathy/nanoGPT) for architecture inspiration
- PyTorch team for the deep learning framework
- Gradio team for the web interface framework
- William Shakespeare for the training data

---

**Enjoy generating Shakespearean text! ๐ŸŽญ**