|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- LLM360/K2-V2 |
|
|
--- |
|
|
|
|
|
# **K2-V2-Instruct** |
|
|
|
|
|
<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/> |
|
|
|
|
|
馃摎 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - 馃摑 [Code](https://github.com/llm360/k2v2_train) - 馃彚 [Project Page](https://huggingface.co/LLM360/K2-V2) |
|
|
|
|
|
K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family. |
|
|
|
|
|
|
|
|
<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/sft-models.png" width="400" alt="K2-V2 SFT results"/> |
|
|
|
|
|
Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows. |
|
|
|
|
|
|
|
|
<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/base-models.png" width="400" alt="K2-V2 GPQA results"/> |
|
|
|
|
|
--- |
|
|
|
|
|
## **Quick Start** |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto") |
|
|
tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2") |
|
|
|
|
|
prompt = "Explain why the derivative of sin(x) is cos(x)." |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."}, |
|
|
{"role": "user", "content": prompt} |
|
|
] |
|
|
text = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
inputs = tokenizer(text, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=200) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## **Evaluation Summary** |
|
|
|
|
|
| Model Specifications | LongBench V2 | AIME25 | HMMT25 | GSM8K | Minerva | GPQA-D | MBPP | HumanEval | LCBv6 | |
|
|
|----------------------|--------------|--------|--------|-------|---------|--------|-------|------------|--------| |
|
|
| **K2 Low**<br><sub>Dense 路 70B</sub> | 40.7 | 27.3 | 19.0 | 92.4 | 85.0 | 48.5 | 71.0 | 82.3 | 39.9 | |
|
|
| **K2 Medium**<br><sub>Dense 路 70B</sub> | 41.3 | 62.0 | 45.6 | 92.0 | 90.6 | 60.6 | 75.8 | 84.2 | 51.3 | |
|
|
| **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 | |
|
|
|
|
|
|
|
|
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results. |
|
|
|
|
|
--- |
|
|
|
|
|
## **Datasets & Mixtures** |
|
|
|
|
|
### **SFT Mix** |
|
|
|
|
|
* **TxT360-3efforts**: curated instruction + mixed-difficulty reasoning traces |
|
|
* Tool-calling demonstrations |
|
|
* Small but high-value corpus to showcase model potential |
|
|
|
|
|
All mixtures, filtering rules, and data sources are fully released for reproducibility. |
|
|
|
|
|
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information. |
|
|
|
|
|
--- |
|
|
|
|
|
## **Model Description** |
|
|
- **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm. |
|
|
- **Training stage:** Pre-training & Post-training |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** Apache 2.0 |
|
|
|
|
|
|
|
|
|
|
|
| Model Hyperparameter | Value | |
|
|
| ----------- | ----------- | |
|
|
| Total Parameters | 70B | |
|
|
| Hidden Size | 8,192 | |
|
|
| Intermediate Size (FFN) | 28,672 | |
|
|
| Number of Attention Heads | 64 | |
|
|
| Number of Layers | 80 | |
|
|
| RMSNorm 蓻 | 1e-5 | |
|
|
| Pre-training Seq Length | 8,192 | |
|
|
| Post-training Seq Length | 524,288 | |
|
|
| Vocab Size | 250,000 | |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use K2-V2-Instruct in your research, please cite the following: |
|
|
|
|
|
``` |
|
|
@misc{llm360_k2v2_2025, |
|
|
title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model}, |
|
|
author = {K2 Team}, |
|
|
year = {2025}, |
|
|
archivePrefix = {arXiv}, |
|
|
eprint = {XXXX.XXXXX}, |
|
|
primaryClass = {cs.CL} |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|