Spaces:
Sleeping
Sleeping
Feat: Updated to 124M model
Browse files- README.md +7 -12
- app.py +7 -7
- nano_gpt_model.pt +2 -2
README.md
CHANGED
|
@@ -19,10 +19,10 @@ This section focuses on Embeddings and Pre-training.
|
|
| 19 |
|
| 20 |
In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
|
| 21 |
|
| 22 |
-
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
- ~50k vocabulary size
|
| 27 |
|
| 28 |
The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
|
|
@@ -31,7 +31,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
|
|
| 31 |
|
| 32 |
### Project Structure
|
| 33 |
|
| 34 |
-
```
|
| 35 |
.
|
| 36 |
βββ assets # Images for README
|
| 37 |
βββ nano_gpt_model.pt # Trained model
|
|
@@ -45,7 +45,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
|
|
| 45 |
|
| 46 |
### Install Dependencies
|
| 47 |
|
| 48 |
-
```
|
| 49 |
pip install -r requirements.txt
|
| 50 |
```
|
| 51 |
|
|
@@ -53,7 +53,7 @@ pip install -r requirements.txt
|
|
| 53 |
|
| 54 |
### Run the Notebook
|
| 55 |
|
| 56 |
-
```
|
| 57 |
jupyter notebook S12Trained.ipynb
|
| 58 |
```
|
| 59 |
|
|
@@ -136,9 +136,4 @@ For the ground is nothing henceforth fell executioner come
|
|
| 136 |
|
| 137 |
|
| 138 |
|
| 139 |
-
### Try it out
|
| 140 |
-
|
| 141 |
-
App Link: https://huggingface.co/spaces/Shilpaj/ShakespeareGPT
|
| 142 |
-
|
| 143 |
|
| 144 |
-

|
|
|
|
| 19 |
|
| 20 |
In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
|
| 21 |
|
| 22 |
+
- 12 transformer layers
|
| 23 |
+
- 12 attention heads
|
| 24 |
+
- 768 embedding dimensions
|
| 25 |
+
- 1024 context window size
|
| 26 |
- ~50k vocabulary size
|
| 27 |
|
| 28 |
The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
|
|
|
|
| 31 |
|
| 32 |
### Project Structure
|
| 33 |
|
| 34 |
+
```bash
|
| 35 |
.
|
| 36 |
βββ assets # Images for README
|
| 37 |
βββ nano_gpt_model.pt # Trained model
|
|
|
|
| 45 |
|
| 46 |
### Install Dependencies
|
| 47 |
|
| 48 |
+
```bash
|
| 49 |
pip install -r requirements.txt
|
| 50 |
```
|
| 51 |
|
|
|
|
| 53 |
|
| 54 |
### Run the Notebook
|
| 55 |
|
| 56 |
+
```bash
|
| 57 |
jupyter notebook S12Trained.ipynb
|
| 58 |
```
|
| 59 |
|
|
|
|
| 136 |
|
| 137 |
|
| 138 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
|
|
app.py
CHANGED
|
@@ -11,11 +11,11 @@ import spaces
|
|
| 11 |
# Configuration class (same as in training)
|
| 12 |
@dataclass
|
| 13 |
class GPTConfig:
|
| 14 |
-
block_size: int =
|
| 15 |
-
vocab_size: int =
|
| 16 |
-
n_layer: int =
|
| 17 |
-
n_head: int =
|
| 18 |
-
n_embd: int =
|
| 19 |
|
| 20 |
# Model architecture classes (copied from training notebook)
|
| 21 |
class CausalSelfAttention(nn.Module):
|
|
@@ -154,8 +154,8 @@ model, device = load_model()
|
|
| 154 |
demo = gr.Interface(
|
| 155 |
fn=generate_text,
|
| 156 |
inputs=[
|
| 157 |
-
gr.Textbox(label="Enter your prompt", value="
|
| 158 |
-
gr.Slider(minimum=1, maximum=
|
| 159 |
gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
|
| 160 |
],
|
| 161 |
outputs=gr.Textbox(label="Generated Text"),
|
|
|
|
| 11 |
# Configuration class (same as in training)
|
| 12 |
@dataclass
|
| 13 |
class GPTConfig:
|
| 14 |
+
block_size: int = 1024
|
| 15 |
+
vocab_size: int = 50257
|
| 16 |
+
n_layer: int = 12
|
| 17 |
+
n_head: int = 12
|
| 18 |
+
n_embd: int = 768
|
| 19 |
|
| 20 |
# Model architecture classes (copied from training notebook)
|
| 21 |
class CausalSelfAttention(nn.Module):
|
|
|
|
| 154 |
demo = gr.Interface(
|
| 155 |
fn=generate_text,
|
| 156 |
inputs=[
|
| 157 |
+
gr.Textbox(label="Enter your prompt", value="Thou shalt"),
|
| 158 |
+
gr.Slider(minimum=1, maximum=1024, value=100, step=1, label="Number of tokens to generate"),
|
| 159 |
gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
|
| 160 |
],
|
| 161 |
outputs=gr.Textbox(label="Generated Text"),
|
nano_gpt_model.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c1799438dae530e76e501535d8c2431c7658ed6354d9dd537dcb6c3c1ac86ab8
|
| 3 |
+
size 548148666
|