Update README.md
Browse files
README.md
CHANGED
|
@@ -111,7 +111,7 @@ As I expected, it improves GSM8K, but doesn't do much to ARC.
|
|
| 111 |
- Training sequence length: 256
|
| 112 |
- Input masking probability: 40%
|
| 113 |
- Label masking probability: 10%
|
| 114 |
-
- Answer-only (full rationale masking) probability: 10%
|
| 115 |
- Batch size: 16, accumulated to 256
|
| 116 |
- Epochs: 6
|
| 117 |
- Learning rate: 1e-5
|
|
|
|
| 111 |
- Training sequence length: 256
|
| 112 |
- Input masking probability: 40%
|
| 113 |
- Label masking probability: 10%
|
| 114 |
+
- Answer-only (full rationale label masking) probability: 10%
|
| 115 |
- Batch size: 16, accumulated to 256
|
| 116 |
- Epochs: 6
|
| 117 |
- Learning rate: 1e-5
|