hunterhector commited on
Commit
6cb377c
·
verified ·
1 Parent(s): f0e8595

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -30
README.md CHANGED
@@ -7,24 +7,13 @@ library_name: transformers
7
  base_model: Qwen/Qwen2.5-32B
8
  ---
9
 
10
- <p align="center">
11
- <img alt="k2-think-banner" src="https://github.com/MBZUAI-IFM/K2-Think-SFT/blob/main/assets/k2-think-banner.png">
12
- </p>
13
-
14
  <p align="center">
15
  <a href="https://k2think.ai"><strong>Try K2-Think</strong></a> ·
16
- <a href="https://arxiv.org/abs/xxxxxxxx"><strong>Tech Report</strong></a> ·
17
- <a href="https://github.com/LLM360/Reasoning360"><strong>Code </strong></a>
18
  </p>
19
 
20
  <br>
21
 
22
- K2-Think is a 32 billion parameter open-weights general reasoning model with strong performance in competitive mathematical problem solving. Built on a Qwen2.5-32B base, K2-Think combines long CoT SFT, RL with verifiable rewards, and a test-time scaling scaffold to match or exceed much larger models on public math benchmarks while keeping latency low.
23
-
24
- ## **Highlights**
25
- - **Math specialist at 32B:** State-of-the-art results among open models on AIME-style olympiad math and other hard math sets.
26
- - **Fast generation:** **~2,000 tokens/sec** on our Cerebras WSE deployment; **~10×** faster than typical H100/H200 setups in our tests.
27
- - **Token-efficient reasoning:** Planning reduces average response length by **up to ~14%** at equal or higher accuracy.
28
 
29
  # Quickstart
30
 
@@ -82,24 +71,6 @@ We deploy K2-THINK on Cerebras Wafer-Scale Engine (WSE) systems, leveraging the
82
  | **Cerebras WSE (our deployment)** | **\~2,000** | **\~16 s** |
83
  | Typical **H100/H200** GPU setup | \~200 | \~160 s |
84
 
85
- ---
86
-
87
- ## Token Efficiency
88
-
89
- K2-Think's **Plan-Before-You-Think** methodology combined with Best-of-N sampling produces more concise reasoning chains while maintaining or improving accuracy. Our test-time scaffold reduces average response length by up to **14%** across mathematical benchmarks.
90
-
91
- **Token reduction** per completed answer (SFT+RL checkpoint vs K2-Think):
92
-
93
- | Domain | Benchmark | SFT+RL Checkpoint | K2-Think | Δ |
94
- | -------- | -------------- | ----------------: | ---------: | ---------: |
95
- | **Math** | AIME24 | 23,324 | **20,058** | **−14.0%** |
96
- | **Math** | AIME25 | 25,869 | **24,218** | **−6.38%** |
97
- | **Math** | HMMT25 | 31,475 | **26,977** | **−14.3%** |
98
- | **Math** | OMNI-Math-HARD | 35,266 | **30,032** | **−14.0%** |
99
- | Code | LiveCodeBench | 13,552 | **12,166** | **−10.2%** |
100
- | Science | GPQA-Diamond | 15,271 | **14,661** | **−3.99%** |
101
-
102
-
103
  ---
104
 
105
  ## Safety Evaluation
 
7
  base_model: Qwen/Qwen2.5-32B
8
  ---
9
 
 
 
 
 
10
  <p align="center">
11
  <a href="https://k2think.ai"><strong>Try K2-Think</strong></a> ·
 
 
12
  </p>
13
 
14
  <br>
15
 
16
+ K2-Think is a 32 billion parameter open-weights general reasoning model with strong performance in competitive mathematical problem solving.
 
 
 
 
 
17
 
18
  # Quickstart
19
 
 
71
  | **Cerebras WSE (our deployment)** | **\~2,000** | **\~16 s** |
72
  | Typical **H100/H200** GPU setup | \~200 | \~160 s |
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ---
75
 
76
  ## Safety Evaluation