Update README.md
Browse files
README.md
CHANGED
|
@@ -53,7 +53,7 @@ Layers 11-16: Full Attention Blocks
|
|
| 53 |
| **Hidden Dimension** | 512 | 512 |
|
| 54 |
| **Vocabulary Size** | 4,466 | 35,560 |
|
| 55 |
| **Training Dataset** | TinyChat only | TinyStories + TinyChat + HQ Sentences |
|
| 56 |
-
| **Total Tokens** | ~1M conversations | 3M+ tokens |
|
| 57 |
| **Final Loss** | ~2.0 | ~2.0 |
|
| 58 |
| **Final Perplexity** | 7.29-9.70 | 7.29-10.0 |
|
| 59 |
| **Training Time** | ~17 hours | ~2-4 hours |
|
|
@@ -127,11 +127,10 @@ Layers 11-16: Full Attention Blocks
|
|
| 127 |
|
| 128 |
### Performance Metrics
|
| 129 |
|
| 130 |
-
| Metric | Initial | Final |
|
| 131 |
-
|
| 132 |
-
| Training Loss | ~
|
| 133 |
-
| Perplexity | ~
|
| 134 |
-
|
| 135 |
|
| 136 |

|
| 137 |
> [!NOTE]
|
|
|
|
| 53 |
| **Hidden Dimension** | 512 | 512 |
|
| 54 |
| **Vocabulary Size** | 4,466 | 35,560 |
|
| 55 |
| **Training Dataset** | TinyChat only | TinyStories + TinyChat + HQ Sentences |
|
| 56 |
+
| **Total Tokens** | ~1M conversations | ~3M+ tokens |
|
| 57 |
| **Final Loss** | ~2.0 | ~2.0 |
|
| 58 |
| **Final Perplexity** | 7.29-9.70 | 7.29-10.0 |
|
| 59 |
| **Training Time** | ~17 hours | ~2-4 hours |
|
|
|
|
| 127 |
|
| 128 |
### Performance Metrics
|
| 129 |
|
| 130 |
+
| Metric | Initial | Final |
|
| 131 |
+
|--------|---------|-------|
|
| 132 |
+
| Training Loss | ~10.0 | ~1.7 |
|
| 133 |
+
| Perplexity | ~4000+ | ~6 |
|
|
|
|
| 134 |
|
| 135 |

|
| 136 |
> [!NOTE]
|