Update README.md
Browse files
README.md
CHANGED
|
@@ -30,4 +30,21 @@ language:
|
|
| 30 |
- vi
|
| 31 |
- hi
|
| 32 |
- bn
|
| 33 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
- vi
|
| 31 |
- hi
|
| 32 |
- bn
|
| 33 |
+
---
|
| 34 |
+
|
| 35 |
+
# Magistral-Small-2506-Vision
|
| 36 |
+
|
| 37 |
+
Inspired by the , this is an experimental checkpoint of  with vision.
|
| 38 |
+
|
| 39 |
+
Magistral Small is a GRPO-powered reasoning fine-tune of , which is a vision-capable LLM.
|
| 40 |
+
|
| 41 |
+
In its technical report, Mistral states that Magistral was fine-tuned on text-only data, but the authors report results on MMMU, MMMU-Pro and MathVista benchmarks, which show modest improvements despite text-only training.
|
| 42 |
+
This suggests that Magistral successfully generalized its reasoning capabilities to multimodal data.
|
| 43 |
+
|
| 44 |
+
Mistral removed Magistral's vision encoder in their official. This may be because of the performance gap between text-only and multimodal inputs.
|
| 45 |
+
|
| 46 |
+
In this model, I grafted Mistral Small 3.1's vision encoder on to Magistral Small. No further training was done, which should mean that text-only performance of this model should be the same as Mistral's official release.
|
| 47 |
+
|
| 48 |
+
The model was tested with vLLM and should work with any toolkit supporting Mistral Small 3.1. The Transformers implementation of Mistral 3 does not work well.
|
| 49 |
+
|
| 50 |
+
I will soon benchmark the model on several vision benchmarks.
|