Update README.md
Browse files
README.md
CHANGED
|
@@ -2,13 +2,29 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
# LLM360 Research Suite: K2 Loss Spike 1
|
| 5 |
-
During the first K2 training phase, we encountered two loss spikes.
|
| 6 |
|
| 7 |
<img src="k2_spike_1.png" alt="k2 spike 1"/>
|
| 8 |
|
| 9 |
# Purpose
|
| 10 |
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
## About the LLM360 Research Suite
|
| 13 |
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.
|
| 14 |
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
# LLM360 Research Suite: K2 Loss Spike 1
|
| 5 |
+
During the first K2 training phase, we encountered two loss spikes. This repo contains 34 checkpoints that capture the training dynamics during the loss spikes.
|
| 6 |
|
| 7 |
<img src="k2_spike_1.png" alt="k2 spike 1"/>
|
| 8 |
|
| 9 |
# Purpose
|
| 10 |
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.
|
| 11 |
|
| 12 |
+
## First 10 Checkpoints
|
| 13 |
+
| Checkpoints | |
|
| 14 |
+
| ----------- | ----------- |
|
| 15 |
+
| [Checkpoint 160](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_160) | [Checkpoint 170](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_170) |
|
| 16 |
+
| [Checkpoint 162](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_162) | [Checkpoint 172](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_172) |
|
| 17 |
+
| [Checkpoint 164](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_164) | [Checkpoint 174](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_174) |
|
| 18 |
+
| [Checkpoint 166](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_166) | [Checkpoint 176](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_176) |
|
| 19 |
+
| [Checkpoint 168](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_168) | [Checkpoint 178](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_178) |
|
| 20 |
+
|
| 21 |
+
[to find all branches: git branch -a]
|
| 22 |
+
|
| 23 |
+
## Loss Spike's on the LLM360 Evaluation Suite
|
| 24 |
+
|
| 25 |
+
something here
|
| 26 |
+
|
| 27 |
+
|
| 28 |
## About the LLM360 Research Suite
|
| 29 |
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.
|
| 30 |
|