ByteDance-Seed
/

Seed-Coder-8B-Instruct

@@ -55,17 +55,16 @@ Seed-Coder-8B-Instruct demonstrates strong performance across a variety of codin
 - Robustness across different programming languages and domains.
 - Ability to understand, reason, and repair complex code snippets.
-|             Model             | Size | HumanEval | HumanEval (+) | MBPP | MBPP+ | MHPP | BigCodeBench (Full) | BigCodeBench (Hard) | LiveCodeBench (2410-2502) |
-|:-----------------------------:|-----:|:---------:|:-------------:|:----:|:-----:|:----:|:-------------------:|:-------------------:|:-------------------------:|
-| CodeLlama-7B-Instruct         |   7B |    40.9   |      33.5     | 54.0 |  44.4 |  6.7 |         21.9        |         3.4         |            3.6            |
-| DeepSeek-Coder-6.7B-Instruct  | 6.7B |    74.4   |      71.3     | 74.9 |  65.6 | 20.0 |         35.5        |         10.1        |            9.6            |
-| CodeQwen1.5-7B-Chat           |   7B |    83.5   |      78.7     | 77.7 |  67.2 | 17.6 |         39.6        |         18.9        |            3.0            |
-| Yi-Coder-9B-Chat              |   9B |    82.3   |      74.4     | 82.0 |  69.0 | 26.7 |         38.1        |         11.5        |            17.5           |
-| Llama-3.1-8B-Instruct         |   8B |    68.3   |      59.8     | 70.1 |  59.0 | 17.1 |         36.6        |         13.5        |            11.5           |
-| OpenCoder-8B-Instruct         |   8B |    83.5   |      78.7     | 79.1 |  69.0 | 30.5 |         40.3        |         16.9        |            17.1           |
-| Qwen2.5-Coder-7B-Instruct     |   7B |    88.4   |      84.1     | 82.0 |  71.4 | 26.7 |         41.0        |         18.2        |            17.3           |
-| Seed-Coder-8B-Instruct (0411) |   8B |    84.8   |      78.7     | 85.2 |  71.2 | 36.2 |         53.3        |         20.5        |            24.7           |
 For detailed results, please check our [📑 paper](https://arxiv.org/pdf/xxx.xxxxx).

 - Robustness across different programming languages and domains.
 - Ability to understand, reason, and repair complex code snippets.
+|             Model             | HumanEval | MBPP | MHPP | BigCodeBench (Full) | BigCodeBench (Hard) | LiveCodeBench (2410-2502) |
+|:-----------------------------:|:---------:|:----:|:----:|:-------------------:|:-------------------:|:-------------------------:|
+| CodeLlama-7B-Instruct         |    40.9   | 54.0 |  6.7 |         21.9        |         3.4         |            3.6            |
+| DeepSeek-Coder-6.7B-Instruct  |    74.4   | 74.9 | 20.0 |         35.5        |         10.1        |            9.6            |
+| CodeQwen1.5-7B-Chat           |    83.5   | 77.7 | 17.6 |         39.6        |         18.9        |            3.0            |
+| Yi-Coder-9B-Chat              |    82.3   | 82.0 | 26.7 |         38.1        |         11.5        |            17.5           |
+| Llama-3.1-8B-Instruct         |    68.3   | 70.1 | 17.1 |         36.6        |         13.5        |            11.5           |
+| OpenCoder-8B-Instruct         |    83.5   | 79.1 | 30.5 |         40.3        |         16.9        |            17.1           |
+| Qwen2.5-Coder-7B-Instruct     |    88.4   | 82.0 | 26.7 |         41.0        |         18.2        |            17.3           |
+| Seed-Coder-8B-Instruct (0411) |    84.8   | 85.2 | 36.2 |         53.3        |         20.5        |            24.7           |
 For detailed results, please check our [📑 paper](https://arxiv.org/pdf/xxx.xxxxx).