yuyuzhang commited on
Commit
8d4693b
·
verified ·
1 Parent(s): 9e4d51e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -82
README.md CHANGED
@@ -88,89 +88,11 @@ print(response)
88
  ```
89
 
90
  ## Evaluation
 
91
 
92
- Seed-Coder-8B-Reasoning has been evaluated extensively on reasoning-intensive code benchmarks, showing:
93
- - Significant improvements on **competitive programming** datasets and coding challenges.
94
- - Enhanced ability to **break down complex problems**, **design correct algorithms**, and **produce efficient implementations**.
95
- - Strong generalization to unseen problems across multiple domains (math, strings, arrays, graphs, DP, etc.).
96
-
97
- <table>
98
- <tr>
99
- <th rowspan="2">Model</th>
100
- <th colspan="3">LiveCodeBench-Hard</th>
101
- <th colspan="3">LiveCodeBench-Medium</th>
102
- <th colspan="3">LiveCodeBench-Easy</th>
103
- <th rowspan="2">Overall</th>
104
- </tr>
105
- <tr>
106
- <th>4mon</th><th>3mon</th><th>2mon</th>
107
- <th>4mon</th><th>3mon</th><th>2mon</th>
108
- <th>4mon</th><th>3mon</th><th>2mon</th>
109
- </tr>
110
-
111
- <!-- ~8B Models -->
112
- <tr><td colspan="11"><b>~8B Models</b></td></tr>
113
- <tr>
114
- <td>DeepSeek-R1-Distill-Qwen-7B</td>
115
- <td>11.3</td><td>10.7</td><td>9.6</td>
116
- <td>39.6</td><td>37.2</td><td>37.1</td>
117
- <td>76.2</td><td>77.1</td><td>67.1</td>
118
- <td>36.5</td>
119
- </tr>
120
- <tr>
121
- <td>DeepSeek-R1-Distill-Seed-Coder-8B</td>
122
- <td>13.6</td><td>13.9</td><td>13.4</td>
123
- <td>39.6</td><td>38.7</td><td>39.3</td>
124
- <td>79.8</td><td>80.2</td><td>73.2</td>
125
- <td>39.0</td>
126
- </tr>
127
- <tr>
128
- <td>OlympicCoder-7B</td>
129
- <td>12.7</td><td>11.8</td><td>12.5</td>
130
- <td>40.8</td><td>39.0</td><td>38.7</td>
131
- <td>78.0</td><td>77.1</td><td>67.8</td>
132
- <td>37.9</td>
133
- </tr>
134
- <tr>
135
- <td>Qwen3-8B-thinking</td>
136
- <td>27.5</td><td>23.5</td><td>19.7</td>
137
- <td>65.7</td><td>59.7</td><td>58.5</td>
138
- <td>98.0</td><td>98.1</td><td>97.3</td>
139
- <td>57.4</td>
140
- </tr>
141
- <tr>
142
- <td>Seed-Coder-8B-Reasoning</td>
143
- <td>27.6</td><td>28.0</td><td>31.0</td>
144
- <td>65.8</td><td>59.2</td><td>57.5</td>
145
- <td>87.8</td><td>88.0</td><td>80.1</td>
146
- <td>53.6</td>
147
- </tr>
148
-
149
- <!-- 13B+ Models -->
150
- <tr><td colspan="11"><b>13B+ Models</b></td></tr>
151
- <tr>
152
- <td>DeepSeek-R1-Distill-Qwen-14B</td>
153
- <td>21.3</td><td>20.5</td><td>16.1</td>
154
- <td>58.1</td><td>53.4</td><td>51.4</td>
155
- <td>93.3</td><td>94.2</td><td>93.7</td>
156
- <td>51.9</td>
157
- </tr>
158
- <tr>
159
- <td>Claude-3.7-Sonnet-thinking</td>
160
- <td>27.3</td><td>30.8</td><td>31.0</td>
161
- <td>54.5</td><td>55.1</td><td>51.4</td>
162
- <td>96.2</td><td>100.0</td><td>100.0</td>
163
- <td>53.3</td>
164
- </tr>
165
- <tr>
166
- <td>o3-mini-low</td>
167
- <td>30.3</td><td>32.3</td><td>28.6</td>
168
- <td>69.6</td><td>61.2</td><td>54.1</td>
169
- <td>98.7</td><td>100.0</td><td>100.0</td>
170
- <td>59.4</td>
171
- </tr>
172
- </table>
173
-
174
 
175
  For detailed benchmark performance, please refer to our [📑 Technical Report](https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf).
176
 
 
88
  ```
89
 
90
  ## Evaluation
91
+ Seed-Coder-8B-Reasoning strikes impressive performance on competitive programming, demonstrating that smaller LLMs can also be competent on complex reasoning tasks. Our model surpasses QwQ-32B and DeepSeek-R1 on IOI'2024, and achieves an ELO rating comparable to o1-mini on Codeforces contests.
92
 
93
+ <p align="center">
94
+ <img width="50%" src="reasoning-ioi.jpg"> <img width="50%" src="reasoning-codeforces.jpg">
95
+ </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  For detailed benchmark performance, please refer to our [📑 Technical Report](https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf).
98