Update README.md (#3)
Browse files- Update README.md (e35a4f2196745f43f418e626f75b899724de3723)
README.md
CHANGED
|
@@ -1,4 +1,3 @@
|
|
| 1 |
-
|
| 2 |
---
|
| 3 |
license: apache-2.0
|
| 4 |
language:
|
|
@@ -9,23 +8,19 @@ base_model:
|
|
| 9 |
|
| 10 |
# **K2-V2-Instruct**
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
<img src="figures/banner.png" alt="k2-banner-placeholder"/>
|
| 15 |
-
|
| 16 |
-
<br>
|
| 17 |
|
| 18 |
-
|
| 19 |
|
|
|
|
| 20 |
|
| 21 |
-
<img src="figures/sft-models.png" width="400" alt="k2-sft-aime"/>
|
| 22 |
|
| 23 |
-
|
| 24 |
|
|
|
|
| 25 |
|
| 26 |
-
<img src="figures/base-models.png" width="400" alt="k2-base-gpqa"/>
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
---
|
| 31 |
|
|
@@ -63,7 +58,6 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 63 |
| **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
|
| 64 |
|
| 65 |
|
| 66 |
-
|
| 67 |
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
|
| 68 |
|
| 69 |
---
|
|
@@ -78,23 +72,26 @@ Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf
|
|
| 78 |
|
| 79 |
All mixtures, filtering rules, and data sources are fully released for reproducibility.
|
| 80 |
|
|
|
|
|
|
|
| 81 |
---
|
| 82 |
|
| 83 |
## **Model Description**
|
| 84 |
-
- **Model type:**
|
| 85 |
-
- **Training stage:**
|
| 86 |
- **Language(s) (NLP):** English
|
| 87 |
- **License:** Apache 2.0
|
| 88 |
|
| 89 |
|
|
|
|
| 90 |
| Model Hyperparameter | Value |
|
| 91 |
| ----------- | ----------- |
|
| 92 |
| Total Parameters | 70B |
|
| 93 |
| Hidden Size | 8,192 |
|
| 94 |
-
| Intermediate Size (
|
| 95 |
| Number of Attention Heads | 64 |
|
| 96 |
-
| Number of
|
| 97 |
-
| RMSNorm 蓻 | 1e
|
| 98 |
| Pre-training Seq Length | 8,192 |
|
| 99 |
| Post-training Seq Length | 524,288 |
|
| 100 |
| Vocab Size | 250,000 |
|
|
@@ -103,8 +100,10 @@ All mixtures, filtering rules, and data sources are fully released for reproduci
|
|
| 103 |
|
| 104 |
## Citation
|
| 105 |
|
|
|
|
|
|
|
| 106 |
```
|
| 107 |
-
@misc{
|
| 108 |
title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
|
| 109 |
author = {K2 Team},
|
| 110 |
year = {2025},
|
|
@@ -115,3 +114,8 @@ All mixtures, filtering rules, and data sources are fully released for reproduci
|
|
| 115 |
```
|
| 116 |
|
| 117 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
|
|
|
| 8 |
|
| 9 |
# **K2-V2-Instruct**
|
| 10 |
|
| 11 |
+
<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
馃摎 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - 馃摑 [Code](https://github.com/llm360/k2v2_train) - 馃彚 [Project Page](https://huggingface.co/LLM360/K2-V2)
|
| 14 |
|
| 15 |
+
K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.
|
| 16 |
|
|
|
|
| 17 |
|
| 18 |
+
<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/sft-models.png" width="400" alt="K2-V2 SFT results"/>
|
| 19 |
|
| 20 |
+
Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.
|
| 21 |
|
|
|
|
| 22 |
|
| 23 |
+
<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/base-models.png" width="400" alt="K2-V2 GPQA results"/>
|
| 24 |
|
| 25 |
---
|
| 26 |
|
|
|
|
| 58 |
| **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
|
| 59 |
|
| 60 |
|
|
|
|
| 61 |
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
|
| 62 |
|
| 63 |
---
|
|
|
|
| 72 |
|
| 73 |
All mixtures, filtering rules, and data sources are fully released for reproducibility.
|
| 74 |
|
| 75 |
+
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.
|
| 76 |
+
|
| 77 |
---
|
| 78 |
|
| 79 |
## **Model Description**
|
| 80 |
+
- **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
|
| 81 |
+
- **Training stage:** Pre-training & Post-training
|
| 82 |
- **Language(s) (NLP):** English
|
| 83 |
- **License:** Apache 2.0
|
| 84 |
|
| 85 |
|
| 86 |
+
|
| 87 |
| Model Hyperparameter | Value |
|
| 88 |
| ----------- | ----------- |
|
| 89 |
| Total Parameters | 70B |
|
| 90 |
| Hidden Size | 8,192 |
|
| 91 |
+
| Intermediate Size (FFN) | 28,672 |
|
| 92 |
| Number of Attention Heads | 64 |
|
| 93 |
+
| Number of Layers | 80 |
|
| 94 |
+
| RMSNorm 蓻 | 1e-5 |
|
| 95 |
| Pre-training Seq Length | 8,192 |
|
| 96 |
| Post-training Seq Length | 524,288 |
|
| 97 |
| Vocab Size | 250,000 |
|
|
|
|
| 100 |
|
| 101 |
## Citation
|
| 102 |
|
| 103 |
+
If you use K2-V2-Instruct in your research, please cite the following:
|
| 104 |
+
|
| 105 |
```
|
| 106 |
+
@misc{llm360_k2v2_2025,
|
| 107 |
title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
|
| 108 |
author = {K2 Team},
|
| 109 |
year = {2025},
|
|
|
|
| 114 |
```
|
| 115 |
|
| 116 |
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
|