Update README.md
Browse files
README.md
CHANGED
|
@@ -127,9 +127,12 @@ The key hyperparameters used were:
|
|
| 127 |
|
| 128 |
### Training Results
|
| 129 |
|
| 130 |
-
The model achieved
|
| 131 |
|
|
|
|
|
|
|
| 132 |
- **Training Infrastructure**: 8x NVIDIA A100 40GB GPUs
|
|
|
|
| 133 |
- **Total Training Steps**: 120,000
|
| 134 |
- **Distributed Training**: NCCL backend with enhanced stability settings
|
| 135 |
- **Memory Optimization**: BFloat16 precision with gradient accumulation
|
|
|
|
| 127 |
|
| 128 |
### Training Results
|
| 129 |
|
| 130 |
+
The model achieved the following performance metrics:
|
| 131 |
|
| 132 |
+
- **Final Training Loss**: 1.2823
|
| 133 |
+
- **Final Evaluation Loss**: 1.1720
|
| 134 |
- **Training Infrastructure**: 8x NVIDIA A100 40GB GPUs
|
| 135 |
+
- **Training Duration**: 262:24:39 hours
|
| 136 |
- **Total Training Steps**: 120,000
|
| 137 |
- **Distributed Training**: NCCL backend with enhanced stability settings
|
| 138 |
- **Memory Optimization**: BFloat16 precision with gradient accumulation
|