Update README.md
Browse files
README.md
CHANGED
|
@@ -122,20 +122,6 @@ This synergistic combination creates a model that excels not only at providing a
|
|
| 122 |
|
| 123 |
### Training Results
|
| 124 |
|
| 125 |
-
#### Performance Metrics
|
| 126 |
-
- **Final Training Loss**: 0.003759
|
| 127 |
-
- **Training Runtime**: 8,446.67 seconds (~2.35 hours)
|
| 128 |
-
- **Training Samples per Second**: 156.929
|
| 129 |
-
- **Training Steps per Second**: 4.904
|
| 130 |
-
- **Total Training Steps**: 41,400
|
| 131 |
-
- **Completed Epochs**: 4.999924559047633
|
| 132 |
-
|
| 133 |
-
#### Resource Utilization
|
| 134 |
-
- **Total Input Tokens Seen**: 2,531,530,240 tokens
|
| 135 |
-
- **Total FLOPs**: 3.96 × 10²⁰
|
| 136 |
-
- **DDP Timeout**: 180,000,000 seconds
|
| 137 |
-
- **Plot Loss**: Enabled (training loss visualization available)
|
| 138 |
-
|
| 139 |
### Training Loss Curve
|
| 140 |
The model training included comprehensive loss tracking and visualization. The training loss curve below shows the convergence pattern over the 41,400 training steps across 5 epochs:
|
| 141 |
|
|
|
|
| 122 |
|
| 123 |
### Training Results
|
| 124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
### Training Loss Curve
|
| 126 |
The model training included comprehensive loss tracking and visualization. The training loss curve below shows the convergence pattern over the 41,400 training steps across 5 epochs:
|
| 127 |
|