OptimusePrime commited on
Commit
9c00d47
·
verified ·
1 Parent(s): ffd6910

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -30,4 +30,21 @@ language:
30
  - vi
31
  - hi
32
  - bn
33
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  - vi
31
  - hi
32
  - bn
33
+ ---
34
+
35
+ # Magistral-Small-2506-Vision
36
+
37
+ Inspired by the ![https://huggingface.co/ngxson/Devstral-Small-Vision-2505-GGUF](Devstral vision experiment), this is an experimental checkpoint of ![https://huggingface.co/mistralai/Magistral-Small-2506](Magistral-Small-2506) with vision.
38
+
39
+ Magistral Small is a GRPO-powered reasoning fine-tune of ![https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503](Mistral Small 3.1), which is a vision-capable LLM.
40
+
41
+ In its technical report, Mistral states that Magistral was fine-tuned on text-only data, but the authors report results on MMMU, MMMU-Pro and MathVista benchmarks, which show modest improvements despite text-only training.
42
+ This suggests that Magistral successfully generalized its reasoning capabilities to multimodal data.
43
+
44
+ Mistral removed Magistral's vision encoder in their official. This may be because of the performance gap between text-only and multimodal inputs.
45
+
46
+ In this model, I grafted Mistral Small 3.1's vision encoder on to Magistral Small. No further training was done, which should mean that text-only performance of this model should be the same as Mistral's official release.
47
+
48
+ The model was tested with vLLM and should work with any toolkit supporting Mistral Small 3.1. The Transformers implementation of Mistral 3 does not work well.
49
+
50
+ I will soon benchmark the model on several vision benchmarks.