Aidan Mannion commited on
Commit ·
ee1b6d6
1
Parent(s): e0b4b0e
Update README.md
Browse files
README.md
CHANGED
|
@@ -63,6 +63,7 @@ Experiments on general-domain data suggest that, given it's specialised training
|
|
| 63 |
- linear learning rate schedule with 10,770 warmup steps
|
| 64 |
- effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
|
| 65 |
- MLM masking probability 0.15
|
|
|
|
| 66 |
**Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
|
| 67 |
|
| 68 |
|
|
|
|
| 63 |
- linear learning rate schedule with 10,770 warmup steps
|
| 64 |
- effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
|
| 65 |
- MLM masking probability 0.15
|
| 66 |
+
|
| 67 |
**Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
|
| 68 |
|
| 69 |
|