GGUF
ehartford commited on
Commit
4005952
Β·
verified Β·
1 Parent(s): 58d3f2c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Devstral-Vision-Small-2507 GGUF
5
+
6
+ Quantized GGUF versions of [cognitivecomputations/Devstral-Vision-Small-2507](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507) - the multimodal coding specialist that combines Devstral's exceptional coding abilities with vision understanding.
7
+
8
+ ## Model Description
9
+
10
+ This is the first vision-enabled version of Devstral, created by transplanting Devstral's language model weights into Mistral-Small-3.2's multimodal architecture. It enables:
11
+ - Converting UI screenshots to code
12
+ - Debugging visual rendering issues
13
+ - Implementing designs from mockups
14
+ - Understanding codebases with visual context
15
+
16
+ ## Quantization Selection Guide
17
+
18
+ | Quantization | Size | Min RAM | Recommended For | Quality | Notes |
19
+ |-------------|------|---------|-----------------|---------|-------|
20
+ | **Q8_0** | 23GB | 24GB | RTX 3090/4090/A6000 users wanting maximum quality | β˜…β˜…β˜…β˜…β˜… | Near-lossless, best for production use |
21
+ | **Q6_K** | 18GB | 20GB | High-end GPUs with focus on quality | β˜…β˜…β˜…β˜…β˜† | Excellent quality/size balance |
22
+ | **Q5_K_M** | 16GB | 18GB | RTX 3080 Ti/4070 Ti users | β˜…β˜…β˜…β˜…β˜† | Great balance of quality and performance |
23
+ | **Q4_K_M** | 13GB | 16GB | **Most users** - RTX 3060 12GB/3070/4060 | β˜…β˜…β˜…β˜†β˜† | The sweet spot, minimal quality loss |
24
+ | **IQ4_XS** | 12GB | 14GB | Experimental - newer compression method | β˜…β˜…β˜…β˜†β˜† | Good alternative to Q4_K_M |
25
+ | **Q3_K_M** | 11GB | 12GB | 8-12GB GPUs, quality-conscious users | β˜…β˜…β˜†β˜†β˜† | Noticeable quality drop for complex code |
26
+
27
+ ### Choosing the Right Quantization
28
+
29
+ **For coding with vision tasks, I recommend:**
30
+ - **Production/Professional use**: Q8_0 or Q6_K
31
+ - **General development**: Q4_K_M (best balance)
32
+ - **Limited VRAM**: Q5_K_M if you can fit it, otherwise Q4_K_M
33
+ - **Experimental**: Try IQ4_XS for potentially better quality at similar size to Q4_K_M
34
+
35
+ **Avoid Q3_K_M unless you're VRAM-constrained** - the quality degradation becomes noticeable for complex coding tasks and visual understanding.
36
+
37
+ ## Usage Examples
38
+
39
+ ### With llama.cpp
40
+
41
+ ```bash
42
+ # Download the model
43
+ huggingface-cli download cognitivecomputations/Devstral-Vision-Small-2507-GGUF \
44
+ Devstral-Small-Vision-2507-Q4_K_M.gguf \
45
+ --local-dir .
46
+
47
+ # Run with llama.cpp
48
+ ./llama-cli -m Devstral-Small-Vision-2507-Q4_K_M.gguf \
49
+ -p "Analyze this UI and generate React code" \
50
+ --image screenshot.png \
51
+ -c 8192
52
+ ```
53
+
54
+ ### With LM Studio
55
+
56
+ 1. Download your chosen quantization
57
+ 2. Load in LM Studio
58
+ 3. Enable multimodal/vision mode in settings
59
+ 4. Drag and drop images into the chat
60
+
61
+ ### With ollama
62
+
63
+ ```bash
64
+ # Create Modelfile
65
+ cat > Modelfile << EOF
66
+ FROM ./Devstral-Small-Vision-2507-Q4_K_M.gguf
67
+ PARAMETER temperature 0.7
68
+ PARAMETER num_ctx 8192
69
+ EOF
70
+
71
+ # Create and run
72
+ ollama create devstral-vision -f Modelfile
73
+ ollama run devstral-vision
74
+ ```
75
+
76
+ ### With koboldcpp
77
+
78
+ ```bash
79
+ python koboldcpp.py --model Devstral-Small-Vision-2507-Q4_K_M.gguf \
80
+ --contextsize 8192 \
81
+ --gpulayers 999 \
82
+ --multimodal
83
+ ```
84
+
85
+ ## Performance Tips
86
+
87
+ 1. **Context Size**: This model supports up to 128k context, but start with 8k-16k for better performance
88
+ 2. **GPU Layers**: Offload all layers to GPU if possible (`--gpulayers 999` or `-ngl 999`)
89
+ 3. **Batch Size**: Increase batch size for better throughput if you have VRAM headroom
90
+ 4. **Temperature**: Use lower temperatures (0.1-0.3) for code generation, higher (0.7-0.9) for creative tasks
91
+
92
+ ## Hardware Requirements
93
+
94
+ | Quantization | Single GPU | Partial Offload | CPU Only |
95
+ |-------------|------------|-----------------|----------|
96
+ | Q8_0 | 24GB VRAM | 16GB VRAM + 16GB RAM | 32GB RAM |
97
+ | Q6_K | 20GB VRAM | 12GB VRAM + 16GB RAM | 24GB RAM |
98
+ | Q5_K_M | 18GB VRAM | 12GB VRAM + 12GB RAM | 24GB RAM |
99
+ | Q4_K_M | 16GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
100
+ | IQ4_XS | 14GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
101
+ | Q3_K_M | 12GB VRAM | 6GB VRAM + 12GB RAM | 16GB RAM |
102
+
103
+ ## Model Capabilities
104
+
105
+ βœ… **Strengths:**
106
+ - Exceptional at converting visual designs to code
107
+ - Strong debugging abilities with visual context
108
+ - Maintains Devstral's 53.6% SWE-Bench performance
109
+ - Handles multiple programming languages
110
+ - 128k token context window
111
+
112
+ ⚠️ **Limitations:**
113
+ - Not specifically fine-tuned for vision-to-code tasks
114
+ - Vision performance bounded by Mistral-Small-3.2's capabilities
115
+ - Requires decent hardware for optimal performance
116
+ - Quantization impacts both vision and coding quality
117
+
118
+ ## License
119
+
120
+ Apache 2.0 (inherited from base models)
121
+
122
+ ## Acknowledgments
123
+
124
+ - Original model by [Eric Hartford](https://erichartford.com/) at [Cognitive Computations](https://cognitivecomputations.ai/)
125
+ - Built on [Mistral AI](https://mistral.ai/)'s Devstral and Mistral-Small models
126
+ - Quantized using llama.cpp
127
+
128
+ ## Links
129
+
130
+ - [Original Model](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507)
131
+ - [Devstral Base](https://huggingface.co/mistralai/Devstral-Small-2507)
132
+ - [Mistral-Small Vision](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506)
133
+
134
+ ---
135
+
136
+ *For issues or questions about these quantizations, please open an issue in the repository.*