File size: 4,213 Bytes
fc6ba6f
 
 
 
 
 
 
 
 
 
e35a4f2
fc6ba6f
e35a4f2
fc6ba6f
e35a4f2
fc6ba6f
 
e35a4f2
fc6ba6f
e35a4f2
fc6ba6f
 
e35a4f2
fc6ba6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e35a4f2
 
fc6ba6f
 
 
e35a4f2
 
fc6ba6f
 
 
 
e35a4f2
fc6ba6f
 
 
 
e35a4f2
fc6ba6f
e35a4f2
 
fc6ba6f
 
 
 
 
 
 
 
e35a4f2
 
fc6ba6f
e35a4f2
fc6ba6f
 
 
 
 
 
 
 
 
 
e35a4f2
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
license: apache-2.0
language:
- en
base_model:
- LLM360/K2-V2
---

# **K2-V2-Instruct**

<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/>

馃摎 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - 馃摑 [Code](https://github.com/llm360/k2v2_train) - 馃彚 [Project Page](https://huggingface.co/LLM360/K2-V2)

K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.


<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/sft-models.png" width="400" alt="K2-V2 SFT results"/>

Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.


<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/base-models.png" width="400" alt="K2-V2 GPQA results"/>

---

## **Quick Start**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")

prompt = "Explain why the derivative of sin(x) is cos(x)."
messages = [
    {"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## **Evaluation Summary**

| Model Specifications | LongBench V2 | AIME25 | HMMT25 | GSM8K | Minerva | GPQA-D | MBPP | HumanEval | LCBv6 |
|----------------------|--------------|--------|--------|-------|---------|--------|-------|------------|--------|
| **K2 Low**<br><sub>Dense 路 70B</sub> | 40.7 | 27.3 | 19.0 | 92.4 | 85.0 | 48.5 | 71.0 | 82.3 | 39.9 |
| **K2 Medium**<br><sub>Dense 路 70B</sub> | 41.3 | 62.0 | 45.6 | 92.0 | 90.6 | 60.6 | 75.8 | 84.2 | 51.3 |
| **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |


Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.

---

## **Datasets & Mixtures**

### **SFT Mix**

* **TxT360-3efforts**: curated instruction + mixed-difficulty reasoning traces
* Tool-calling demonstrations
* Small but high-value corpus to showcase model potential

All mixtures, filtering rules, and data sources are fully released for reproducibility.

Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.

---

## **Model Description**
- **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
- **Training stage:** Pre-training & Post-training
- **Language(s) (NLP):** English
- **License:** Apache 2.0



| Model Hyperparameter      | Value |
| ----------- | ----------- |
| Total Parameters      | 70B       |
| Hidden Size   | 8,192        |
| Intermediate Size (FFN)   | 28,672        |
| Number of Attention Heads   | 64        |
| Number of Layers  | 80        |
| RMSNorm 蓻  | 1e-5        |
| Pre-training Seq Length   | 8,192        |
| Post-training Seq Length   | 524,288        |
| Vocab Size | 250,000 |

---

## Citation

If you use K2-V2-Instruct in your research, please cite the following:

```
@misc{llm360_k2v2_2025,
  title         = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
  author        = {K2 Team},
  year          = {2025},
  archivePrefix = {arXiv},
  eprint        = {XXXX.XXXXX},
  primaryClass  = {cs.CL}
}
```