Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ This is a **learning project** demonstrating how to train a transformer-based la
|
|
| 13 |
|
| 14 |
- **Model Type:** Character-level Transformer Language Model
|
| 15 |
- **Architecture:** 6-layer Transformer Encoder with causal masking
|
| 16 |
-
- **Parameters:** ~
|
| 17 |
- **Training Data:** Shakespeare's plays (~1.1M characters)
|
| 18 |
- **Framework:** PyTorch
|
| 19 |
- **Training Time:** ~8 hours on single GPU
|
|
@@ -138,7 +138,7 @@ What light through yonder window breaks?
|
|
| 138 |
- ❌ Not suitable for production use
|
| 139 |
|
| 140 |
### What This Model Is NOT
|
| 141 |
-
- ❌ Not comparable to GPT-2, GPT-3, or modern LLMs
|
| 142 |
- ❌ Not fine-tuned for instruction following
|
| 143 |
- ❌ Not suitable for serious text generation applications
|
| 144 |
- ❌ Not production-ready
|
|
@@ -184,7 +184,7 @@ This project was an educational exercise in:
|
|
| 184 |
|
| 185 |
| Model | Parameters | Quality |
|
| 186 |
|-------|------------|---------|
|
| 187 |
-
| This Model |
|
| 188 |
| GPT-2 Small | 117M | High |
|
| 189 |
| GPT-3 | 175B | Very High |
|
| 190 |
|
|
|
|
| 13 |
|
| 14 |
- **Model Type:** Character-level Transformer Language Model
|
| 15 |
- **Architecture:** 6-layer Transformer Encoder with causal masking
|
| 16 |
+
- **Parameters:** ~4M parameters
|
| 17 |
- **Training Data:** Shakespeare's plays (~1.1M characters)
|
| 18 |
- **Framework:** PyTorch
|
| 19 |
- **Training Time:** ~8 hours on single GPU
|
|
|
|
| 138 |
- ❌ Not suitable for production use
|
| 139 |
|
| 140 |
### What This Model Is NOT
|
| 141 |
+
- ❌ Not comparable to GPT-2, GPT-3, or modern LLMs (GPT-2 Small has 117M, ~30x larger)
|
| 142 |
- ❌ Not fine-tuned for instruction following
|
| 143 |
- ❌ Not suitable for serious text generation applications
|
| 144 |
- ❌ Not production-ready
|
|
|
|
| 184 |
|
| 185 |
| Model | Parameters | Quality |
|
| 186 |
|-------|------------|---------|
|
| 187 |
+
| This Model | 4M | Low (educational) |
|
| 188 |
| GPT-2 Small | 117M | High |
|
| 189 |
| GPT-3 | 175B | Very High |
|
| 190 |
|