Update README.md
Browse files
README.md
CHANGED
|
@@ -113,7 +113,7 @@ The model was trained locally on a single-node with multiple Nvidia A100 GPUs us
|
|
| 113 |
|
| 114 |
#### Training Hyperparameters
|
| 115 |
|
| 116 |
-
- **Precision:** We
|
| 117 |
- **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
|
| 118 |
- **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
|
| 119 |
- **Batch Size:** 64
|
|
|
|
| 113 |
|
| 114 |
#### Training Hyperparameters
|
| 115 |
|
| 116 |
+
- **Precision:** We trained the model in bfloat16, and subsequently publish it with the same precision of the base "google/flan-t5-large" model in FP32.
|
| 117 |
- **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
|
| 118 |
- **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
|
| 119 |
- **Batch Size:** 64
|