MarioBarbeque
/

CyberSolve-LinAlg-1.2

text2text-generation

text-generation-inference

Model card Files Files and versions

MarioBarbeque commited on Feb 14, 2025

Commit

80f0578

·

verified ·

1 Parent(s): 17abd8d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -113,7 +113,7 @@ The model was trained locally on a single-node with multiple Nvidia A100 GPUs us
 #### Training Hyperparameters
-- **Precision:** We use FP32 precision, the same precision of the base "google/flan-t5-large" model.
 - **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
 - **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
 - **Batch Size:** 64

 #### Training Hyperparameters
+- **Precision:** We trained the model in bfloat16, and subsequently publish it with the same precision of the base "google/flan-t5-large" model in FP32.
 - **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
 - **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
 - **Batch Size:** 64