desaifan-mbzuai commited on
Commit
654833d
verified
1 Parent(s): 273e128

First draft for K2-V2-Instruct

Browse files
Files changed (1) hide show
  1. README.md +117 -3
README.md CHANGED
@@ -1,3 +1,117 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ base_model:
7
+ - LLM360/K2-V2
8
+ ---
9
+
10
+ # **K2-V2-Instruct**
11
+
12
+ 馃摎 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf ) - 馃摑 [Code](github_url) - 馃彚 [Project Page](https://huggingface.co/LLM360/K2-V2)
13
+
14
+ <img src="figures/banner.png" alt="k2-banner-placeholder"/>
15
+
16
+ <br>
17
+
18
+ K2-V2 is our best fully open source model to date and ranked among the best open weight models of its class. As the latest base model in the LLM360's strongest project family, K2 features a dense architecture with 70 billion parameters.
19
+
20
+
21
+ <img src="figures/sft-models.png" width="400" alt="k2-sft-aime"/>
22
+
23
+ Beyond standard competencies like knowledge and conversation, K2 provides advanced capabilities, including long context consistency, deep mathematical knowledge, and reasoning behaviors. These serve as foundational building blocks that enable sophisticated downstream use cases, such as solving complex math problems and executing agentic workflows.
24
+
25
+
26
+ <img src="figures/base-models.png" width="400" alt="k2-base-gpqa"/>
27
+
28
+ During our light SFT phase, our goal is to capitalize on the reasoning capabilities obtained during mid-training while allowing users to experience the model without having to wait for lengthy reasoning to complete.
29
+
30
+ ---
31
+
32
+ ## **Quick Start**
33
+
34
+ ```python
35
+ from transformers import AutoModelForCausalLM, AutoTokenizer
36
+
37
+ model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
38
+ tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")
39
+
40
+ prompt = "Explain why the derivative of sin(x) is cos(x)."
41
+ messages = [
42
+ {"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."},
43
+ {"role": "user", "content": prompt}
44
+ ]
45
+ text = tokenizer.apply_chat_template(
46
+ messages,
47
+ tokenize=False,
48
+ add_generation_prompt=True
49
+ )
50
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
51
+ outputs = model.generate(**inputs, max_new_tokens=200)
52
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
53
+ ```
54
+
55
+ ---
56
+
57
+ ## **Evaluation Summary**
58
+
59
+ | Model Specifications | LongBench V2 | AIME25 | HMMT25 | GSM8K | Minerva | GPQA-D | MBPP | HumanEval | LCBv6 |
60
+ |----------------------|--------------|--------|--------|-------|---------|--------|-------|------------|--------|
61
+ | **K2 Low**<br><sub>Dense 路 70B</sub> | 40.7 | 27.3 | 19.0 | 92.4 | 85.0 | 48.5 | 71.0 | 82.3 | 39.9 |
62
+ | **K2 Medium**<br><sub>Dense 路 70B</sub> | 41.3 | 62.0 | 45.6 | 92.0 | 90.6 | 60.6 | 75.8 | 84.2 | 51.3 |
63
+ | **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
64
+
65
+
66
+
67
+ Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
68
+
69
+ ---
70
+
71
+ ## **Datasets & Mixtures**
72
+
73
+ ### **SFT Mix**
74
+
75
+ * **TxT360-3efforts**: curated instruction + mixed-difficulty reasoning traces
76
+ * Tool-calling demonstrations
77
+ * Small but high-value corpus to showcase model potential
78
+
79
+ All mixtures, filtering rules, and data sources are fully released for reproducibility.
80
+
81
+ ---
82
+
83
+ ## **Model Description**
84
+ - **Model type:** Language model with transformer architecture
85
+ - **Training stage:** Pretraining & Post-training
86
+ - **Language(s) (NLP):** English
87
+ - **License:** Apache 2.0
88
+
89
+
90
+ | Model Hyperparameter | Value |
91
+ | ----------- | ----------- |
92
+ | Total Parameters | 70B |
93
+ | Hidden Size | 8,192 |
94
+ | Intermediate Size (MLPs) | 28,672 |
95
+ | Number of Attention Heads | 64 |
96
+ | Number of Hidden Layers | 80 |
97
+ | RMSNorm 蓻 | 1e^-5 |
98
+ | Pre-training Seq Length | 8,192 |
99
+ | Post-training Seq Length | 524,288 |
100
+ | Vocab Size | 250,000 |
101
+
102
+ ---
103
+
104
+ ## Citation
105
+
106
+ ```
107
+ @misc{llm360@k2v2,
108
+ title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
109
+ author = {K2 Team},
110
+ year = {2025},
111
+ archivePrefix = {arXiv},
112
+ eprint = {XXXX.XXXXX},
113
+ primaryClass = {cs.CL}
114
+ }
115
+ ```
116
+
117
+