MarioBarbeque commited on
Commit
80f0578
·
verified ·
1 Parent(s): 17abd8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -113,7 +113,7 @@ The model was trained locally on a single-node with multiple Nvidia A100 GPUs us
113
 
114
  #### Training Hyperparameters
115
 
116
- - **Precision:** We use FP32 precision, the same precision of the base "google/flan-t5-large" model.
117
  - **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
118
  - **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
119
  - **Batch Size:** 64
 
113
 
114
  #### Training Hyperparameters
115
 
116
+ - **Precision:** We trained the model in bfloat16, and subsequently publish it with the same precision of the base "google/flan-t5-large" model in FP32.
117
  - **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
118
  - **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
119
  - **Batch Size:** 64