Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,7 @@ license: apache-2.0
|
|
| 6 |
We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2).
|
| 7 |
* The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after 160 checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint 160 and training returned to normal.
|
| 8 |
* The second loss spike occured after restarting training to fix the first loss spike at checkpoint 186 and lasted from ~8 checkpoints.
|
|
|
|
| 9 |
|
| 10 |
We are releasing these checkpoints so others can study this interesting phenomena in large model training.
|
| 11 |
<img src="loss_spike.png" alt="k2 loss spikes"/>
|
|
|
|
| 6 |
We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2).
|
| 7 |
* The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after 160 checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint 160 and training returned to normal.
|
| 8 |
* The second loss spike occured after restarting training to fix the first loss spike at checkpoint 186 and lasted from ~8 checkpoints.
|
| 9 |
+
* For every spike checkpoint, we also uploaded the corresponding normal checkpoint for easy comparison. You could find different checkpoints in different branches.
|
| 10 |
|
| 11 |
We are releasing these checkpoints so others can study this interesting phenomena in large model training.
|
| 12 |
<img src="loss_spike.png" alt="k2 loss spikes"/>
|