Commit
·
082b1b2
1
Parent(s):
2388414
Update README.md
Browse files
README.md
CHANGED
|
@@ -63,11 +63,17 @@ The following techniques were used to shorten training time:
|
|
| 63 |
- **Using EMA only in the last phase of training**
|
| 64 |
|
| 65 |
### Additional Details
|
| 66 |
-
|
| 67 |
-
- **Hardware:**
|
| 68 |
-
- **Optimizer:** AdamW
|
| 69 |
-
- **Batch:** 8192
|
| 70 |
-
- **Learning rate:** 1e-4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
## Evaluation
|
| 73 |
|
|
|
|
| 63 |
- **Using EMA only in the last phase of training**
|
| 64 |
|
| 65 |
### Additional Details
|
| 66 |
+
#### Phase 1
|
| 67 |
+
- **Hardware:** 8 x 8 x A100 (80gb)
|
| 68 |
+
- **Optimizer:** AdamW
|
| 69 |
+
- **Batch:** 8192
|
| 70 |
+
- **Learning rate:** 1e-4
|
| 71 |
+
|
| 72 |
+
#### Phase 2-4
|
| 73 |
+
- **Hardware:** 8 x 8 x H100 (80gb)
|
| 74 |
+
- **Optimizer:** LAMB
|
| 75 |
+
- **Batch:** 6144
|
| 76 |
+
- **Learning rate:** 5e-3
|
| 77 |
|
| 78 |
## Evaluation
|
| 79 |
|