Update README.md
Browse files
README.md
CHANGED
|
@@ -85,20 +85,20 @@ print(tokenizer.decode(sample[0]))
|
|
| 85 |
|
| 86 |
## Training details
|
| 87 |
|
| 88 |
-
The model is trained of
|
| 89 |
|
| 90 |
| Hyperparameters | Value |
|
| 91 |
| :----------------------------| :-----: |
|
| 92 |
-
| per_device_train_batch_size |
|
| 93 |
| gradient_accumulation_steps | 1 |
|
| 94 |
| epoch | 3 |
|
| 95 |
-
| steps |
|
| 96 |
| learning_rate | 2e-5 |
|
| 97 |
| lr schedular type | cosine |
|
| 98 |
| warmup ratio | 0.1 |
|
| 99 |
| optimizer | adamw |
|
| 100 |
| fp16 | True |
|
| 101 |
-
| GPU |
|
| 102 |
|
| 103 |
### Important Note
|
| 104 |
|
|
|
|
| 85 |
|
| 86 |
## Training details
|
| 87 |
|
| 88 |
+
The model is trained of 8 A100 80GB for approximately 50hrs.
|
| 89 |
|
| 90 |
| Hyperparameters | Value |
|
| 91 |
| :----------------------------| :-----: |
|
| 92 |
+
| per_device_train_batch_size | 8 |
|
| 93 |
| gradient_accumulation_steps | 1 |
|
| 94 |
| epoch | 3 |
|
| 95 |
+
| steps | 8628 |
|
| 96 |
| learning_rate | 2e-5 |
|
| 97 |
| lr schedular type | cosine |
|
| 98 |
| warmup ratio | 0.1 |
|
| 99 |
| optimizer | adamw |
|
| 100 |
| fp16 | True |
|
| 101 |
+
| GPU | 8 A100 80GB |
|
| 102 |
|
| 103 |
### Important Note
|
| 104 |
|