desaifan-mbzuai commited on
Commit
41732d0
verified
1 Parent(s): 0dd8265

Update README.md (#3)

Browse files

- Update README.md (e35a4f2196745f43f418e626f75b899724de3723)

Files changed (1) hide show
  1. README.md +22 -18
README.md CHANGED
@@ -1,4 +1,3 @@
1
-
2
  ---
3
  license: apache-2.0
4
  language:
@@ -9,23 +8,19 @@ base_model:
9
 
10
  # **K2-V2-Instruct**
11
 
12
- 馃摎 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf ) - 馃摑 [Code](github_url) - 馃彚 [Project Page](https://huggingface.co/LLM360/K2-V2)
13
-
14
- <img src="figures/banner.png" alt="k2-banner-placeholder"/>
15
-
16
- <br>
17
 
18
- K2-V2 is our best fully open source model to date and ranked among the best open weight models of its class. As the latest base model in the LLM360's strongest project family, K2 features a dense architecture with 70 billion parameters.
19
 
 
20
 
21
- <img src="figures/sft-models.png" width="400" alt="k2-sft-aime"/>
22
 
23
- Beyond standard competencies like knowledge and conversation, K2 provides advanced capabilities, including long context consistency, deep mathematical knowledge, and reasoning behaviors. These serve as foundational building blocks that enable sophisticated downstream use cases, such as solving complex math problems and executing agentic workflows.
24
 
 
25
 
26
- <img src="figures/base-models.png" width="400" alt="k2-base-gpqa"/>
27
 
28
- During our light SFT phase, our goal is to capitalize on the reasoning capabilities obtained during mid-training while allowing users to experience the model without having to wait for lengthy reasoning to complete.
29
 
30
  ---
31
 
@@ -63,7 +58,6 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
63
  | **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
64
 
65
 
66
-
67
  Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
68
 
69
  ---
@@ -78,23 +72,26 @@ Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf
78
 
79
  All mixtures, filtering rules, and data sources are fully released for reproducibility.
80
 
 
 
81
  ---
82
 
83
  ## **Model Description**
84
- - **Model type:** Language model with transformer architecture
85
- - **Training stage:** Pretraining & Post-training
86
  - **Language(s) (NLP):** English
87
  - **License:** Apache 2.0
88
 
89
 
 
90
  | Model Hyperparameter | Value |
91
  | ----------- | ----------- |
92
  | Total Parameters | 70B |
93
  | Hidden Size | 8,192 |
94
- | Intermediate Size (MLPs) | 28,672 |
95
  | Number of Attention Heads | 64 |
96
- | Number of Hidden Layers | 80 |
97
- | RMSNorm 蓻 | 1e^-5 |
98
  | Pre-training Seq Length | 8,192 |
99
  | Post-training Seq Length | 524,288 |
100
  | Vocab Size | 250,000 |
@@ -103,8 +100,10 @@ All mixtures, filtering rules, and data sources are fully released for reproduci
103
 
104
  ## Citation
105
 
 
 
106
  ```
107
- @misc{llm360@k2v2,
108
  title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
109
  author = {K2 Team},
110
  year = {2025},
@@ -115,3 +114,8 @@ All mixtures, filtering rules, and data sources are fully released for reproduci
115
  ```
116
 
117
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  language:
 
8
 
9
  # **K2-V2-Instruct**
10
 
11
+ <img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/>
 
 
 
 
12
 
13
+ 馃摎 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - 馃摑 [Code](https://github.com/llm360/k2v2_train) - 馃彚 [Project Page](https://huggingface.co/LLM360/K2-V2)
14
 
15
+ K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.
16
 
 
17
 
18
+ <img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/sft-models.png" width="400" alt="K2-V2 SFT results"/>
19
 
20
+ Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.
21
 
 
22
 
23
+ <img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/base-models.png" width="400" alt="K2-V2 GPQA results"/>
24
 
25
  ---
26
 
 
58
  | **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
59
 
60
 
 
61
  Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
62
 
63
  ---
 
72
 
73
  All mixtures, filtering rules, and data sources are fully released for reproducibility.
74
 
75
+ Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.
76
+
77
  ---
78
 
79
  ## **Model Description**
80
+ - **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
81
+ - **Training stage:** Pre-training & Post-training
82
  - **Language(s) (NLP):** English
83
  - **License:** Apache 2.0
84
 
85
 
86
+
87
  | Model Hyperparameter | Value |
88
  | ----------- | ----------- |
89
  | Total Parameters | 70B |
90
  | Hidden Size | 8,192 |
91
+ | Intermediate Size (FFN) | 28,672 |
92
  | Number of Attention Heads | 64 |
93
+ | Number of Layers | 80 |
94
+ | RMSNorm 蓻 | 1e-5 |
95
  | Pre-training Seq Length | 8,192 |
96
  | Post-training Seq Length | 524,288 |
97
  | Vocab Size | 250,000 |
 
100
 
101
  ## Citation
102
 
103
+ If you use K2-V2-Instruct in your research, please cite the following:
104
+
105
  ```
106
+ @misc{llm360_k2v2_2025,
107
  title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
108
  author = {K2 Team},
109
  year = {2025},
 
114
  ```
115
 
116
 
117
+
118
+
119
+
120
+
121
+