Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- ko
|
| 6 |
+
- ja
|
| 7 |
+
---
|
| 8 |
+
# Intermediate Checkpoints Release
|
| 9 |
+
|
| 10 |
+
For the first time among Korean-targeted LLMs, we’re releasing **intermediate checkpoints** from the Tri family—**0.5B**, **1.9B**, and **7B**—to advance research on LLM training dynamics.
|
| 11 |
+
|
| 12 |
+
Checkpoints are published **every 20,000 steps (≈40B tokens)**, and each step’s release is distinguished by its **branch name** so you can easily navigate between versions and analyze training progress at consistent intervals.
|
| 13 |
+
|
| 14 |
+
You can grab the **Tri-7B** model here: [https://huggingface.co/trillionlabs/Tri-7B](https://huggingface.co/trillionlabs/Tri-7B?utm_source=chatgpt.com).
|
| 15 |
+
|
| 16 |
+
You can grab the **Tri-70B** preview model here: [https://huggingface.co/trillionlabs/Tri-70B-preview-SFT](https://huggingface.co/trillionlabs/Tri-70B-preview-SFT).
|
| 17 |
+
|
| 18 |
+
We’re also sharing the **0.5B** and **1.9B** runs—originally produced for system bring-up but now available as valuable artifacts for analyzing training behavior at smaller scales.
|
| 19 |
+
|
| 20 |
+
You can browse all intermediate checkpoints here:
|
| 21 |
+
- **Tri-0.5B** → [https://huggingface.co/trillionlabs/0.5B_250221_hf](https://huggingface.co/trillionlabs/0.5B_250221_hf)
|
| 22 |
+
- **Tri-1.9B** → [https://huggingface.co/trillionlabs/1.9B_250221_hf](https://huggingface.co/trillionlabs/1.9B_250221_hf)
|
| 23 |
+
- **Tri-7B** → [https://huggingface.co/trillionlabs/7B_250212_hf](https://huggingface.co/trillionlabs/7B_250212_hf)
|
| 24 |
+
- **Tri-70B(SFT Preview)** → [https://huggingface.co/trillionlabs/Tri-70B-intermediate-checkpoints](https://huggingface.co/trillionlabs/Tri-70B-intermediate-checkpoints)
|
| 25 |
+
|
| 26 |
+
Dive into the full details—including training configuration and loss curves —on our [blog](BLOG LINK).
|