K2-V2

File size: 5,697 Bytes

e6fa89b


---
license: apache-2.0
language:
- en
---

# **K2-V2**

📚 [Tech Report](arxiv_url) - 📝 [Code](github_url) - 🏢 [Project Page](self_url)

K2-V2 is our best fully open source model to date and ranked among the best open weight models of its class. As the latest base model in the LLM360's strongest project family, K2 features a dense architecture with 70 billion parameters. 

<img src="figures/sft-models.pdf" alt="k2-sft-aime"/>

Beyond standard competencies like knowledge and conversation, K2 provides advanced capabilities, including long context consistency, deep mathematical knowledge, and reasoning behaviors. These serve as foundational building blocks that enable sophisticated downstream use cases, such as solving complex math problems and executing agentic workflows.


<img src="figures/base-models.pdf" alt="k2-base-gpqa"/>

---

## **Usage**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")

prompt = "Explain why the derivative of sin(x) is cos(x)."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## **Evaluation Summary**

| Task / Model | base | mid-1 | mid-2 | mid-3 | mid-4 | Qwen2.5-72B | Llama3.0-70B | Llama3.1-70B | Olmo3-32B |
|--------------|------|-------|-------|-------|-------|--------------|---------------|---------------|------------|
| **Architecture** | Dense | Dense | Dense | Dense | Dense | Dense | Dense | Dense | Dense |
| **# Total Params** | 70B | 70B | 70B | 70B | 70B | 72B | 70B | 70B | 32B |
| **# Activated Params** | 70B | 70B | 70B | 70B | 70B | 72B | 70B | 70B | 32B |
| **General Tasks** | | | | | | | | | |
| **MMLU** | 74.3 | 74.4 | 73.5 | 75.0 | 75.2 | **86.1** | <u>79.5</u> | 79.3 | 75.2 |
| **MMLU-Pro** | 43.7 | 46.8 | 48.1 | **59.8** | 57.0 | <u>58.1</u> | 52.8 | 53.8 | 49.6 |
| **BBH** | 68.4 | 79.8 | 81.1 | 82.2 | <u>83.2</u> | **86.3** | 82.2 | 82.1 | 77.6 |
| **HELLASWAG** | <u>87.8</u> | 86.9 | 86.6 | 86.6 | 86.0 | 87.6 | **88.0** | 85.0 | 84.8 |
| **WINOGRANDE** | 82.6 | 83.7 | 83.7 | 83.7 | 83.0 | 83.9 | <u>85.3</u> | 79.8 | **90.3** |
| **PIQA** | 84.2 | 84.0 | 83.3 | 82.9 | 83.1 | 83.5 | <u>84.6</u> | 84.3 | **85.6** |
| **TRUTHFULQA** | 54.0 | 54.9 | 55.1 | <u>55.8</u> | 53.9 | **60.5** | 45.6 | 49.7 | 54.9 |
| **Math & STEM Tasks** | | | | | | | | | |
| **GPQA-DIAMOND** | 26.3 | 31.3 | 27.8 | <u>43.9</u> | **55.1** | 34.9 | 21.2 | 27.3 | 30.3 |
| **GSM8K** | 68.0 | 76.4 | 82.1 | **93.6** | <u>92.5</u> | 91.2 | 83.2 | 81.1 | 80.5 |
| **MATH** | 27.8 | 38.2 | 41.1 | **94.7** | <u>91.4</u> | 58.5 | 41.9 | 41.6 | 43.4 |
| **AIME 2025** | 0.0 | 17.6 | 25.1 | **53.2** | <u>46.9</u> | 1.7 | 0.1 | 0.2 | 14.7 |
| **ARC-CHALLENGE** | 64.9 | 66.4 | 66.4 | 66.0 | 66.3 | **72.4** | <u>69.2</u> | 64.9 | 65.4 |
| **Coding Tasks** | | | | | | | | | |
| **MBPP** | 57.6 | 57.8 | 58.2 | 59.8 | 61.8 | **75.4** | <u>69.2</u> | 64.4 | 60.2 |
| **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
| **Logic Puzzles** | | | | | | | | | |
| **COUNTDOWN** | 1.3 | <u>53.3</u> | 53.1 | 35.9 | **75.6** | 6.0 | 1.0 | 0.5 | 23.2 |
| **KK-4 PEOPLE** | 4.8 | 44.9 | <u>68.0</u> | 64.5 | **92.9** | 26.1 | 4.2 | 7.6 | 42.4 |
| **KK-8 PEOPLE** | 0.5 | 23.2 | 41.3 | <u>51.6</u> | **82.8** | 5.7 | 1.1 | 1.3 | 13.0 |
| **ORDER-15 ITEMS** | 4.7 | 30.7 | 47.2 | <u>55.8</u> | **87.6** | 37.0 | 3.5 | 4.5 | 25.0 |
| **ORDER-30 ITEMS** | 0.0 | 0.3 | 3.0 | <u>34.1</u> | **40.3** | 0.7 | 0.2 | 0.1 | 0.6 |
| **Instruction Following** | | | | | | | | | |
| **IFEVAL** | 17.4 | 26.2 | 28.5 | <u>34.5</u> | 26.7 | **40.3** | 15.1 | 17.4 | 13.2 |
| **Arabic** | | | | | | | | | |
| **MMLU-Arabic** | 65.4 | 66.1 | 64.5 | 66.6 | 65.5 | **74.1** | 65.0 | <u>66.8</u> | 47.8 |


Please refer to our [Tech Report](arxiv_url) for detailed evaluation results.

---

## **Datasets & Mixtures**

K2 training is organized into three stages, each using a transparent, publicly released mixture:

### **Pretraining Mix**

* Large-scale natural text corpus (web, books, code, multilingual)
* Balanced mixture optimized for stable scaling and broad knowledge
* ~12T tokens

### **Mid-Training Mix**

* **TxT360-Midas**: reasoning-oriented + long-context extensions
* Domain-focused sources: math, programming, scientific literature
* Synthetic expansions where natural data is scarce

### **SFT Mix**

* Check out https://huggingface.co/LLM360/K2-V2-Instruct

All mixtures, filtering rules, and data sources are fully released for reproducibility.

---

## **Model Description**
- **Model type:** Language model with transformer architecture
- **Language(s) (NLP):** English
- **License:** Apache 2.0


| Model Hyperparameter      | Value |
| ----------- | ----------- |
| Total Parameters      | 70B       |
| Hidden Size   | 8,192        |
| Intermediate Size (MLPs)   | 28,672        |
| Number of Attention Heads   | 64        |
| Number of Hidden Layers  | 80        |
| RMSNorm ɛ  | 1e^-5        |
| Max Pre-training Seq Length   | 8,192        |
| Max Mid-training Seq Length   | 524,288        |
| Vocab Size | 250,000 |

---

## Citation & Acknowledgment

If you use our dataset in your research, please cite our [K2-V2 paper](LINK):

```
@misc{llm360@k2v2,
  title         = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
  author        = {K2 Team},
  year          = {2025},
  archivePrefix = {arXiv},
  eprint        = {XXXX.XXXXX},
  primaryClass  = {cs.CL}
}
```