LLM360
/

K2-V2-Instruct

@@ -1,4 +1,3 @@
 ---
 license: apache-2.0
 language:
@@ -9,23 +8,19 @@ base_model:
 # **K2-V2-Instruct**
-📚 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf ) - 📝 [Code](github_url) - 🏢 [Project Page](https://huggingface.co/LLM360/K2-V2)
-<img src="figures/banner.png" alt="k2-banner-placeholder"/>
-<br>
-K2-V2 is our best fully open source model to date and ranked among the best open weight models of its class. As the latest base model in the LLM360's strongest project family, K2 features a dense architecture with 70 billion parameters.
-<img src="figures/sft-models.png" width="400" alt="k2-sft-aime"/>
-Beyond standard competencies like knowledge and conversation, K2 provides advanced capabilities, including long context consistency, deep mathematical knowledge, and reasoning behaviors. These serve as foundational building blocks that enable sophisticated downstream use cases, such as solving complex math problems and executing agentic workflows.
-<img src="figures/base-models.png" width="400" alt="k2-base-gpqa"/>
-During our light SFT phase, our goal is to capitalize on the reasoning capabilities obtained during mid-training while allowing users to experience the model without having to wait for lengthy reasoning to complete.
 ---
@@ -63,7 +58,6 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 | **K2 High**<br><sub>Dense · 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
 Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
 ---
@@ -78,23 +72,26 @@ Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf
 All mixtures, filtering rules, and data sources are fully released for reproducibility.
 ---
 ## **Model Description**
-- **Model type:** Language model with transformer architecture
-- **Training stage:** Pretraining & Post-training
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 | Model Hyperparameter      | Value |
 | ----------- | ----------- |
 | Total Parameters      | 70B       |
 | Hidden Size   | 8,192        |
-| Intermediate Size (MLPs)   | 28,672        |
 | Number of Attention Heads   | 64        |
-| Number of Hidden Layers  | 80        |
-| RMSNorm ɛ  | 1e^-5        |
 | Pre-training Seq Length   | 8,192        |
 | Post-training Seq Length   | 524,288        |
 | Vocab Size | 250,000 |
@@ -103,8 +100,10 @@ All mixtures, filtering rules, and data sources are fully released for reproduci
 ## Citation
 ```
-@misc{llm360@k2v2,
   title         = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
   author        = {K2 Team},
   year          = {2025},
@@ -115,3 +114,8 @@ All mixtures, filtering rules, and data sources are fully released for reproduci
 ```

 ---
 license: apache-2.0
 language:
 # **K2-V2-Instruct**
+<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/>
+📚 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - 📝 [Code](https://github.com/llm360/k2v2_train) - 🏢 [Project Page](https://huggingface.co/LLM360/K2-V2)
+K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.
+<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/sft-models.png" width="400" alt="K2-V2 SFT results"/>
+Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.
+<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/base-models.png" width="400" alt="K2-V2 GPQA results"/>
 ---
 | **K2 High**<br><sub>Dense · 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
 Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
 ---
 All mixtures, filtering rules, and data sources are fully released for reproducibility.
+Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.
 ---
 ## **Model Description**
+- **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
+- **Training stage:** Pre-training & Post-training
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 | Model Hyperparameter      | Value |
 | ----------- | ----------- |
 | Total Parameters      | 70B       |
 | Hidden Size   | 8,192        |
+| Intermediate Size (FFN)   | 28,672        |
 | Number of Attention Heads   | 64        |
+| Number of Layers  | 80        |
+| RMSNorm ɛ  | 1e-5        |
 | Pre-training Seq Length   | 8,192        |
 | Post-training Seq Length   | 524,288        |
 | Vocab Size | 250,000 |
 ## Citation
+If you use K2-V2-Instruct in your research, please cite the following:
 ```
+@misc{llm360_k2v2_2025,
   title         = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
   author        = {K2 Team},
   year          = {2025},
 ```