First draft for K2-V2-Instruct
Browse files
README.md
CHANGED
|
@@ -1,3 +1,117 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
base_model:
|
| 7 |
+
- LLM360/K2-V2
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# **K2-V2-Instruct**
|
| 11 |
+
|
| 12 |
+
馃摎 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf ) - 馃摑 [Code](github_url) - 馃彚 [Project Page](https://huggingface.co/LLM360/K2-V2)
|
| 13 |
+
|
| 14 |
+
<img src="figures/banner.png" alt="k2-banner-placeholder"/>
|
| 15 |
+
|
| 16 |
+
<br>
|
| 17 |
+
|
| 18 |
+
K2-V2 is our best fully open source model to date and ranked among the best open weight models of its class. As the latest base model in the LLM360's strongest project family, K2 features a dense architecture with 70 billion parameters.
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
<img src="figures/sft-models.png" width="400" alt="k2-sft-aime"/>
|
| 22 |
+
|
| 23 |
+
Beyond standard competencies like knowledge and conversation, K2 provides advanced capabilities, including long context consistency, deep mathematical knowledge, and reasoning behaviors. These serve as foundational building blocks that enable sophisticated downstream use cases, such as solving complex math problems and executing agentic workflows.
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
<img src="figures/base-models.png" width="400" alt="k2-base-gpqa"/>
|
| 27 |
+
|
| 28 |
+
During our light SFT phase, our goal is to capitalize on the reasoning capabilities obtained during mid-training while allowing users to experience the model without having to wait for lengthy reasoning to complete.
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## **Quick Start**
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 36 |
+
|
| 37 |
+
model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
|
| 38 |
+
tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")
|
| 39 |
+
|
| 40 |
+
prompt = "Explain why the derivative of sin(x) is cos(x)."
|
| 41 |
+
messages = [
|
| 42 |
+
{"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."},
|
| 43 |
+
{"role": "user", "content": prompt}
|
| 44 |
+
]
|
| 45 |
+
text = tokenizer.apply_chat_template(
|
| 46 |
+
messages,
|
| 47 |
+
tokenize=False,
|
| 48 |
+
add_generation_prompt=True
|
| 49 |
+
)
|
| 50 |
+
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 51 |
+
outputs = model.generate(**inputs, max_new_tokens=200)
|
| 52 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## **Evaluation Summary**
|
| 58 |
+
|
| 59 |
+
| Model Specifications | LongBench V2 | AIME25 | HMMT25 | GSM8K | Minerva | GPQA-D | MBPP | HumanEval | LCBv6 |
|
| 60 |
+
|----------------------|--------------|--------|--------|-------|---------|--------|-------|------------|--------|
|
| 61 |
+
| **K2 Low**<br><sub>Dense 路 70B</sub> | 40.7 | 27.3 | 19.0 | 92.4 | 85.0 | 48.5 | 71.0 | 82.3 | 39.9 |
|
| 62 |
+
| **K2 Medium**<br><sub>Dense 路 70B</sub> | 41.3 | 62.0 | 45.6 | 92.0 | 90.6 | 60.6 | 75.8 | 84.2 | 51.3 |
|
| 63 |
+
| **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
|
| 71 |
+
## **Datasets & Mixtures**
|
| 72 |
+
|
| 73 |
+
### **SFT Mix**
|
| 74 |
+
|
| 75 |
+
* **TxT360-3efforts**: curated instruction + mixed-difficulty reasoning traces
|
| 76 |
+
* Tool-calling demonstrations
|
| 77 |
+
* Small but high-value corpus to showcase model potential
|
| 78 |
+
|
| 79 |
+
All mixtures, filtering rules, and data sources are fully released for reproducibility.
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## **Model Description**
|
| 84 |
+
- **Model type:** Language model with transformer architecture
|
| 85 |
+
- **Training stage:** Pretraining & Post-training
|
| 86 |
+
- **Language(s) (NLP):** English
|
| 87 |
+
- **License:** Apache 2.0
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
| Model Hyperparameter | Value |
|
| 91 |
+
| ----------- | ----------- |
|
| 92 |
+
| Total Parameters | 70B |
|
| 93 |
+
| Hidden Size | 8,192 |
|
| 94 |
+
| Intermediate Size (MLPs) | 28,672 |
|
| 95 |
+
| Number of Attention Heads | 64 |
|
| 96 |
+
| Number of Hidden Layers | 80 |
|
| 97 |
+
| RMSNorm 蓻 | 1e^-5 |
|
| 98 |
+
| Pre-training Seq Length | 8,192 |
|
| 99 |
+
| Post-training Seq Length | 524,288 |
|
| 100 |
+
| Vocab Size | 250,000 |
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## Citation
|
| 105 |
+
|
| 106 |
+
```
|
| 107 |
+
@misc{llm360@k2v2,
|
| 108 |
+
title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
|
| 109 |
+
author = {K2 Team},
|
| 110 |
+
year = {2025},
|
| 111 |
+
archivePrefix = {arXiv},
|
| 112 |
+
eprint = {XXXX.XXXXX},
|
| 113 |
+
primaryClass = {cs.CL}
|
| 114 |
+
}
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
|