victormiller commited on
Commit
22e1050
·
verified ·
1 Parent(s): 2516255

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -2,7 +2,10 @@
2
  license: apache-2.0
3
  ---
4
  # LLM360 Research Suite: K2 Loss Spike 1
5
- During the first K2 training phase, we encountered two loss spikes. This repo contains 34 checkpoints that capture the training dynamics during the loss spikes.
 
 
 
6
 
7
  <img src="k2_spike_1.png" alt="k2 spike 1"/>
8
 
 
2
  license: apache-2.0
3
  ---
4
  # LLM360 Research Suite: K2 Loss Spike 1
5
+ We encountered two major loss spikes while training K2.
6
+ The first loss spike occured after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint X and training returned to normal. it went away, and
7
+ The [second loss spike](https://huggingface.co/LLM360/K2-Spike-2/) occured after restarting training to fix the first loss spike at checkpoint X and lasted from ~8 checkpoints.
8
+ We are releasing these checkpoints so others can study this interesting phenomena in large model training.
9
 
10
  <img src="k2_spike_1.png" alt="k2 spike 1"/>
11