Update README.md
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistra
|
|
| 43 |
|
| 44 |
For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
|
| 45 |
|
| 46 |
-
Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
|
| 47 |
|
| 48 |
**Updates compared to [`Devstral Small 1.0`](https://huggingface.co/mistralai/Devstral-Small-2505):**
|
| 49 |
- Improved performance, please refer to the [benchmark results](#benchmark-results).
|
|
@@ -63,11 +63,11 @@ Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
|
|
| 63 |
|
| 64 |
### SWE-Bench
|
| 65 |
|
| 66 |
-
Devstral Small 1.1 achieves a score of **
|
| 67 |
|
| 68 |
| Model | Agentic Scaffold | SWE-Bench Verified (%) |
|
| 69 |
|--------------------|--------------------|------------------------|
|
| 70 |
-
| Devstral Small 1.1 | OpenHands Scaffold | **
|
| 71 |
| Devstral Small 1.0 | OpenHands Scaffold | *46.8* |
|
| 72 |
| GPT-4.1-mini | OpenAI Scaffold | 23.6 |
|
| 73 |
| Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
|
|
@@ -512,4 +512,4 @@ Finally, the game is ready to be played:
|
|
| 512 |
|
| 513 |

|
| 514 |
|
| 515 |
-
Don't hesitate to iterate or give more information to Devstral to improve the game!
|
|
|
|
| 43 |
|
| 44 |
For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
|
| 45 |
|
| 46 |
+
Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral-2507).
|
| 47 |
|
| 48 |
**Updates compared to [`Devstral Small 1.0`](https://huggingface.co/mistralai/Devstral-Small-2505):**
|
| 49 |
- Improved performance, please refer to the [benchmark results](#benchmark-results).
|
|
|
|
| 63 |
|
| 64 |
### SWE-Bench
|
| 65 |
|
| 66 |
+
Devstral Small 1.1 achieves a score of **53.6%** on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.
|
| 67 |
|
| 68 |
| Model | Agentic Scaffold | SWE-Bench Verified (%) |
|
| 69 |
|--------------------|--------------------|------------------------|
|
| 70 |
+
| Devstral Small 1.1 | OpenHands Scaffold | **53.6** |
|
| 71 |
| Devstral Small 1.0 | OpenHands Scaffold | *46.8* |
|
| 72 |
| GPT-4.1-mini | OpenAI Scaffold | 23.6 |
|
| 73 |
| Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
|
|
|
|
| 512 |
|
| 513 |

|
| 514 |
|
| 515 |
+
Don't hesitate to iterate or give more information to Devstral to improve the game!
|