K2-V2-Instruct / README.md

Update README.md

e35a4f2 verified 2 months ago

4.21 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- LLM360/K2-V2
	---

	# K2-V2-Instruct

	<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/>

	📚 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - 📝 [Code](https://github.com/llm360/k2v2_train) - 🏢 [Project Page](https://huggingface.co/LLM360/K2-V2)

	K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.


	<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/sft-models.png" width="400" alt="K2-V2 SFT results"/>

	Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.


	<img src="https://huggingface.co/LLM360/K2-V2/resolve/main/figures/base-models.png" width="400" alt="K2-V2 GPQA results"/>

	---

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")

	prompt = "Explain why the derivative of sin(x) is cos(x)."
	messages = [
	{"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=200)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## Evaluation Summary

	\| Model Specifications \| LongBench V2 \| AIME25 \| HMMT25 \| GSM8K \| Minerva \| GPQA-D \| MBPP \| HumanEval \| LCBv6 \|
	\|----------------------\|--------------\|--------\|--------\|-------\|---------\|--------\|-------\|------------\|--------\|
	\| K2 Low<br><sub>Dense · 70B</sub> \| 40.7 \| 27.3 \| 19.0 \| 92.4 \| 85.0 \| 48.5 \| 71.0 \| 82.3 \| 39.9 \|
	\| K2 Medium<br><sub>Dense · 70B</sub> \| 41.3 \| 62.0 \| 45.6 \| 92.0 \| 90.6 \| 60.6 \| 75.8 \| 84.2 \| 51.3 \|
	\| K2 High<br><sub>Dense · 70B</sub> \| 42.6 \| 80.2 \| 71.4 \| 94.8 \| 94.5 \| 69.3 \| 84.8 \| 91.5 \| 67.0 \|


	Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.

	---

	## Datasets & Mixtures

	### SFT Mix

	* TxT360-3efforts: curated instruction + mixed-difficulty reasoning traces
	* Tool-calling demonstrations
	* Small but high-value corpus to showcase model potential

	All mixtures, filtering rules, and data sources are fully released for reproducibility.

	Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.

	---

	## Model Description
	- Model type: K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
	- Training stage: Pre-training & Post-training
	- Language(s) (NLP): English
	- License: Apache 2.0



	\| Model Hyperparameter \| Value \|
	\| ----------- \| ----------- \|
	\| Total Parameters \| 70B \|
	\| Hidden Size \| 8,192 \|
	\| Intermediate Size (FFN) \| 28,672 \|
	\| Number of Attention Heads \| 64 \|
	\| Number of Layers \| 80 \|
	\| RMSNorm ɛ \| 1e-5 \|
	\| Pre-training Seq Length \| 8,192 \|
	\| Post-training Seq Length \| 524,288 \|
	\| Vocab Size \| 250,000 \|

	---

	## Citation

	If you use K2-V2-Instruct in your research, please cite the following:

	```
	@misc{llm360_k2v2_2025,
	title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
	author = {K2 Team},
	year = {2025},
	archivePrefix = {arXiv},
	eprint = {XXXX.XXXXX},
	primaryClass = {cs.CL}
	}
	```