Model card updated after epoch 1
Browse files
README.md
CHANGED
|
@@ -17,8 +17,8 @@ metrics: [loss, lm_loss, ponder_loss, perplexity_lm]
|
|
| 17 |
Note: This model uses the T5 SentencePiece tokenizer. Perplexity numbers on WT103
|
| 18 |
reported here are not directly comparable to GPT-2 BPE-based PPLs.
|
| 19 |
|
| 20 |
-
### Latest Performance (Epoch
|
| 21 |
-
- **Validation Loss**: 4.
|
| 22 |
-
- **Validation LM Loss**: 4.
|
| 23 |
-
- **Validation Ponder Loss**: 1.
|
| 24 |
-
- **Validation Perplexity (LM-only)**:
|
|
|
|
| 17 |
Note: This model uses the T5 SentencePiece tokenizer. Perplexity numbers on WT103
|
| 18 |
reported here are not directly comparable to GPT-2 BPE-based PPLs.
|
| 19 |
|
| 20 |
+
### Latest Performance (Epoch 1)
|
| 21 |
+
- **Validation Loss**: 4.8248
|
| 22 |
+
- **Validation LM Loss**: 4.8149
|
| 23 |
+
- **Validation Ponder Loss**: 1.0064
|
| 24 |
+
- **Validation Perplexity (LM-only)**: 123.34
|