K2-V2 / README.md

Update README.md

2bbe448 verified 2 months ago

6.14 kB

	---
	license: apache-2.0
	language:
	- en
	---

	# K2-V2

	<img src="figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/>

	📚 [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - 📝 [Code](https://github.com/llm360/k2v2_train) - 🏢 [Project Page](https://huggingface.co/LLM360/K2-V2)

	K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.

	<img src="figures/sft-models.png" width="400" alt="K2-V2 SFT results"/>

	Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.

	<img src="figures/base-models.png" width="400" alt="K2-V2 GPQA results"/>

	---

	## Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("LLM360/K2-V2", device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("LLM360/K2-V2")

	prompt = "Explain why the derivative of sin(x) is cos(x)."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=200)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## Evaluation Summary

	Below we report performance across general, reasoning, mathematical, and coding benchmarks. Scores for K2-V2 checkpoints (base → mid-4) demonstrate the impact of staged mid-training on reasoning quality.

	\| Task / Model \| base \| mid-1 \| mid-2 \| mid-3 \| mid-4 \| Qwen2.5-72B \| Llama3.0-70B \| Llama3.1-70B \| Olmo3-32B \|
	\|--------------\|------\|-------\|-------\|-------\|-------\|--------------\|---------------\|---------------\|------------\|
	\| General Tasks \| \| \| \| \| \| \| \| \| \|
	\| MMLU \| 74.3 \| 74.4 \| 73.5 \| 75.0 \| 75.2 \| 86.1 \| <u>79.5</u> \| 79.3 \| 75.2 \|
	\| MMLU-Pro \| 43.7 \| 46.8 \| 48.1 \| 59.8 \| 57.0 \| <u>58.1</u> \| 52.8 \| 53.8 \| 49.6 \|
	\| BBH \| 68.4 \| 79.8 \| 81.1 \| 82.2 \| <u>83.2</u> \| 86.3 \| 82.2 \| 82.1 \| 77.6 \|
	\| HELLASWAG \| <u>87.8</u> \| 86.9 \| 86.6 \| 86.6 \| 86.0 \| 87.6 \| 88.0 \| 85.0 \| 84.8 \|
	\| WINOGRANDE \| 82.6 \| 83.7 \| 83.7 \| 83.7 \| 83.0 \| 83.9 \| <u>85.3</u> \| 79.8 \| 90.3 \|
	\| PIQA \| 84.2 \| 84.0 \| 83.3 \| 82.9 \| 83.1 \| 83.5 \| <u>84.6</u> \| 84.3 \| 85.6 \|
	\| TRUTHFULQA \| 54.0 \| 54.9 \| 55.1 \| <u>55.8</u> \| 53.9 \| 60.5 \| 45.6 \| 49.7 \| 54.9 \|
	\| Math & STEM Tasks \| \| \| \| \| \| \| \| \| \|
	\| GPQA-DIAMOND \| 26.3 \| 31.3 \| 27.8 \| <u>43.9</u> \| 55.1 \| 34.9 \| 21.2 \| 27.3 \| 30.3 \|
	\| GSM8K \| 68.0 \| 76.4 \| 82.1 \| 93.6 \| <u>92.5</u> \| 91.2 \| 83.2 \| 81.1 \| 80.5 \|
	\| MATH \| 27.8 \| 38.2 \| 41.1 \| 94.7 \| <u>91.4</u> \| 58.5 \| 41.9 \| 41.6 \| 43.4 \|
	\| AIME 2025 \| 0.0 \| 17.6 \| 25.1 \| 53.2 \| <u>46.9</u> \| 1.7 \| 0.1 \| 0.2 \| 14.7 \|
	\| ARC-CHALLENGE \| 64.9 \| 66.4 \| 66.4 \| 66.0 \| 66.3 \| 72.4 \| <u>69.2</u> \| 64.9 \| 65.4 \|
	\| Coding Tasks \| \| \| \| \| \| \| \| \| \|
	\| MBPP \| 57.6 \| 57.8 \| 58.2 \| 59.8 \| 61.8 \| 75.4 \| <u>69.2</u> \| 64.4 \| 60.2 \|
	\| HUMANEVAL \| 50.0 \| 51.2 \| <u>53.7</u> \| 54.3 \| 54.3 \| 54.3 \| 42.1 \| 50.6 \| 36.0 \|


	Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.

	---

	## Datasets & Mixtures

	K2-V2 training is organized into three stages, each using a transparent, publicly released mixture:

	### Pretraining Mix

	* Large-scale natural text corpus spanning web content, books, code, and multilingual sources
	* Mixture designed for stable scaling and broad general-knowledge coverage
	* ~12T tokens

	### Mid-Training Mix

	* TxT360-Midas: reasoning-oriented + long-context extensions
	* Domain-focused sources: math, programming, scientific literature
	* Synthetic expansions where natural data is scarce

	### SFT Mix

	* Check out https://huggingface.co/LLM360/K2-V2-Instruct

	All mixtures, filtering rules, and data sources are fully released for reproducibility.

	Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.

	---

	## Model Description
	- Model type: K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
	- Training stage: Pre-training
	- Language(s) (NLP): English
	- License: Apache 2.0


	\| Model Hyperparameter \| Value \|
	\| ----------- \| ----------- \|
	\| Total Parameters \| 70B \|
	\| Hidden Size \| 8,192 \|
	\| Intermediate Size (FFN) \| 28,672 \|
	\| Number of Attention Heads \| 64 \|
	\| Number of Layers \| 80 \|
	\| RMSNorm ɛ \| 1e-5 \|
	\| Pre-training Seq Length \| 8,192 \|
	\| Max Mid-training Seq Length \| 524,288 \|
	\| Vocab Size \| 250,000 \|


	---

	## Intended Use

	K2-V2 is designed for:

	* research on large language models and reasoning
	* downstream fine-tuning (e.g., instruction following, agents, domain models)
	* experimentation with long-context architectures
	* open, transparent benchmarking of LLM scaling

	K2-V2 is not instruction-tuned. For aligned conversational use, please see K2-V2-Instruct.

	---

	## Limitations

	* May generate incorrect or hallucinated content, especially when asked about facts not seen during training
	* Not optimized for safety, moderation, or refusal behavior (base model)
	* Long-context performance depends on prompt quality and retrieval structure
	* Primarily trained on English; multilingual capabilities are limited
	* Inference cost is high due to the 70B parameter size

	---

	## Citation

	If you use K2-V2 in your research, please cite the following:

	```
	@misc{llm360_k2v2_2025,
	title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
	author = {K2 Team},
	year = {2025},
	archivePrefix = {arXiv},
	eprint = {XXXX.XXXXX},
	primaryClass = {cs.CL}
	}
	```