Update README.md
Browse files
README.md
CHANGED
|
@@ -34,17 +34,17 @@ language:
|
|
| 34 |
|
| 35 |
# Magistral-Small-2506-Vision
|
| 36 |
|
| 37 |
-
Inspired by
|
| 38 |
|
| 39 |
-
Magistral Small is a GRPO-
|
| 40 |
|
| 41 |
In its technical report, Mistral states that Magistral was fine-tuned on text-only data, but the authors report results on MMMU, MMMU-Pro and MathVista benchmarks, which show modest improvements despite text-only training.
|
| 42 |
This suggests that Magistral successfully generalized its reasoning capabilities to multimodal data.
|
| 43 |
|
| 44 |
-
Mistral removed Magistral's vision encoder in their official. This may be because of the performance gap between text-only and multimodal inputs.
|
| 45 |
|
| 46 |
In this model, I grafted Mistral Small 3.1's vision encoder on to Magistral Small. No further training was done, which should mean that text-only performance of this model should be the same as Mistral's official release.
|
| 47 |
|
| 48 |
The model was tested with vLLM and should work with any toolkit supporting Mistral Small 3.1. The Transformers implementation of Mistral 3 does not work well.
|
| 49 |
|
| 50 |
-
I will soon benchmark the model on several vision benchmarks.
|
|
|
|
| 34 |
|
| 35 |
# Magistral-Small-2506-Vision
|
| 36 |
|
| 37 |
+
Inspired by https://huggingface.co/ngxson/Devstral-Small-Vision-2505-GGUF, which is a Devstral vision experiment, this is an experimental checkpoint of Magistral-Small-2506 with vision.
|
| 38 |
|
| 39 |
+
Magistral Small is a GRPO-trained reasoning fine-tune of Mistral Small 3.1, which is a vision-capable LLM.
|
| 40 |
|
| 41 |
In its technical report, Mistral states that Magistral was fine-tuned on text-only data, but the authors report results on MMMU, MMMU-Pro and MathVista benchmarks, which show modest improvements despite text-only training.
|
| 42 |
This suggests that Magistral successfully generalized its reasoning capabilities to multimodal data.
|
| 43 |
|
| 44 |
+
Mistral removed Magistral's vision encoder in their official release. This may be because of the performance gap between text-only and multimodal inputs.
|
| 45 |
|
| 46 |
In this model, I grafted Mistral Small 3.1's vision encoder on to Magistral Small. No further training was done, which should mean that text-only performance of this model should be the same as Mistral's official release.
|
| 47 |
|
| 48 |
The model was tested with vLLM and should work with any toolkit supporting Mistral Small 3.1. The Transformers implementation of Mistral 3 does not work well.
|
| 49 |
|
| 50 |
+
I will soon benchmark the model on several vision benchmarks as there still may be configuration errors in this model which might reduce performance.
|