Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,33 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
---
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
---
|
| 7 |
+
# LLM360 Research Suite: K2 Loss Spike 2
|
| 8 |
+
During the first K2 training phase, we encountered two loss spikes. This repo contains 8 checkpoints that capture the training dynamics during the loss spikes.
|
| 9 |
+
|
| 10 |
+
<img src="k2_spike_1.png" alt="k2 spike 1"/>
|
| 11 |
+
|
| 12 |
+
# Purpose
|
| 13 |
+
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.
|
| 14 |
+
|
| 15 |
+
## All Checkpoints
|
| 16 |
+
| Checkpoints | |
|
| 17 |
+
| ----------- | ----------- |
|
| 18 |
+
| [Checkpoint 186](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_186) | [Checkpoint 194](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_194) |
|
| 19 |
+
| [Checkpoint 188](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_188) | [Checkpoint 196](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_196) |
|
| 20 |
+
| [Checkpoint 190](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_190) | [Checkpoint 198](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_198) |
|
| 21 |
+
| [Checkpoint 192](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_192) | [Checkpoint 200](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_200) |
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
[to find all branches: git branch -a]
|
| 25 |
+
|
| 26 |
+
## Loss Spike's on the LLM360 Evaluation Suite
|
| 27 |
+
|
| 28 |
+
something here
|
| 29 |
+
|
| 30 |
+
## About the LLM360 Research Suite
|
| 31 |
+
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.
|
| 32 |
+
|
| 33 |
+
|