GGUF
File size: 5,462 Bytes
4005952
 
 
e437913
d74a9d2
 
4005952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d74a9d2
 
 
4005952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7add836
 
4005952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
license: apache-2.0
---

NEW - I exported and added mmproj-BF16.gguf to properly support llama.cpp, ollama, and LM Studio.

# Devstral-Vision-Small-2507 GGUF

Quantized GGUF versions of [cognitivecomputations/Devstral-Vision-Small-2507](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507) - the multimodal coding specialist that combines Devstral's exceptional coding abilities with vision understanding.

## Model Description

This is the first vision-enabled version of Devstral, created by transplanting Devstral's language model weights into Mistral-Small-3.2's multimodal architecture. It enables:
- Converting UI screenshots to code
- Debugging visual rendering issues
- Implementing designs from mockups
- Understanding codebases with visual context

## Quantization Selection Guide

| Quantization | Size | Min RAM | Recommended For | Quality | Notes |
|-------------|------|---------|-----------------|---------|-------|
| **Q8_0** | 23GB | 24GB | RTX 3090/4090/A6000 users wanting maximum quality | β˜…β˜…β˜…β˜…β˜… | Near-lossless, best for production use |
| **Q6_K** | 18GB | 20GB | High-end GPUs with focus on quality | β˜…β˜…β˜…β˜…β˜† | Excellent quality/size balance |
| **Q5_K_M** | 16GB | 18GB | RTX 3080 Ti/4070 Ti users | β˜…β˜…β˜…β˜…β˜† | Great balance of quality and performance |
| **Q4_K_M** | 13GB | 16GB | **Most users** - RTX 3060 12GB/3070/4060 | β˜…β˜…β˜…β˜†β˜† | The sweet spot, minimal quality loss |
| **IQ4_XS** | 12GB | 14GB | Experimental - newer compression method | β˜…β˜…β˜…β˜†β˜† | Good alternative to Q4_K_M |
| **Q3_K_M** | 11GB | 12GB | 8-12GB GPUs, quality-conscious users | β˜…β˜…β˜†β˜†β˜† | Noticeable quality drop for complex code |

### Choosing the Right Quantization

**For coding with vision tasks, I recommend:**
- **Production/Professional use**: Q8_0 or Q6_K
- **General development**: Q4_K_M (best balance)
- **Limited VRAM**: Q5_K_M if you can fit it, otherwise Q4_K_M
- **Experimental**: Try IQ4_XS for potentially better quality at similar size to Q4_K_M

**Avoid Q3_K_M unless you're VRAM-constrained** - the quality degradation becomes noticeable for complex coding tasks and visual understanding.

## Usage Examples

### With llama.cpp

```bash
# Download the model
huggingface-cli download cognitivecomputations/Devstral-Vision-Small-2507-GGUF \
  Devstral-Small-Vision-2507-Q4_K_M.gguf \
  --local-dir .
huggingface-cli download cognitivecomputations/Devstral-Vision-Small-2507-GGUF \
  mmproj-BF16.gguf \
  --local-dir .

# Run with llama.cpp
./llama-cli -m Devstral-Small-Vision-2507-Q4_K_M.gguf \
  -p "Analyze this UI and generate React code" \
  --image screenshot.png \
  -c 8192
```

### With LM Studio

1. Download your chosen quantization
2. Load in LM Studio
3. Enable multimodal/vision mode in settings
4. Drag and drop images into the chat

### With ollama

```bash
# Create Modelfile
cat > Modelfile << EOF
FROM ./Devstral-Small-Vision-2507-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
EOF

# Create and run
ollama create devstral-vision -f Modelfile
ollama run devstral-vision
```

### With koboldcpp

```bash
python koboldcpp.py --model Devstral-Small-Vision-2507-Q4_K_M.gguf \
  --contextsize 8192 \
  --gpulayers 999 \
  --multimodal
```

## Performance Tips

1. **Context Size**: This model supports up to 128k context, but start with 8k-16k for better performance
2. **GPU Layers**: Offload all layers to GPU if possible (`--gpulayers 999` or `-ngl 999`)
3. **Batch Size**: Increase batch size for better throughput if you have VRAM headroom
4. **Temperature**: Use lower temperatures (0.1-0.3) for code generation, higher (0.7-0.9) for creative tasks

## Hardware Requirements

| Quantization | Single GPU | Partial Offload | CPU Only |
|-------------|------------|-----------------|----------|
| Q8_0 | 24GB VRAM | 16GB VRAM + 16GB RAM | 32GB RAM |
| Q6_K | 20GB VRAM | 12GB VRAM + 16GB RAM | 24GB RAM |
| Q5_K_M | 18GB VRAM | 12GB VRAM + 12GB RAM | 24GB RAM |
| Q4_K_M | 16GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
| IQ4_XS | 14GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
| Q3_K_M | 12GB VRAM | 6GB VRAM + 12GB RAM | 16GB RAM |

## Model Capabilities

βœ… **Strengths:**
- Exceptional at converting visual designs to code
- Strong debugging abilities with visual context
- Maintains Devstral's 53.6% SWE-Bench performance
- Handles multiple programming languages
- 128k token context window

⚠️ **Limitations:**
- Not specifically fine-tuned for vision-to-code tasks
- Vision performance bounded by Mistral-Small-3.2's capabilities
- Requires decent hardware for optimal performance
- Quantization impacts both vision and coding quality

## License

Apache 2.0 (inherited from base models)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/HM7Pz3FubhAHd8cWGbtFz.png)

## Acknowledgments

- Original model by [Eric Hartford](https://erichartford.com/) at [Cognitive Computations](https://cognitivecomputations.ai/)
- Built on [Mistral AI](https://mistral.ai/)'s Devstral and Mistral-Small models
- Quantized using llama.cpp

## Links

- [Original Model](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507)
- [Devstral Base](https://huggingface.co/mistralai/Devstral-Small-2507)
- [Mistral-Small Vision](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506)

---

*For issues or questions about these quantizations, please open an issue in the repository.*