File size: 5,462 Bytes
4005952 e437913 d74a9d2 4005952 d74a9d2 4005952 7add836 4005952 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
license: apache-2.0
---
NEW - I exported and added mmproj-BF16.gguf to properly support llama.cpp, ollama, and LM Studio.
# Devstral-Vision-Small-2507 GGUF
Quantized GGUF versions of [cognitivecomputations/Devstral-Vision-Small-2507](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507) - the multimodal coding specialist that combines Devstral's exceptional coding abilities with vision understanding.
## Model Description
This is the first vision-enabled version of Devstral, created by transplanting Devstral's language model weights into Mistral-Small-3.2's multimodal architecture. It enables:
- Converting UI screenshots to code
- Debugging visual rendering issues
- Implementing designs from mockups
- Understanding codebases with visual context
## Quantization Selection Guide
| Quantization | Size | Min RAM | Recommended For | Quality | Notes |
|-------------|------|---------|-----------------|---------|-------|
| **Q8_0** | 23GB | 24GB | RTX 3090/4090/A6000 users wanting maximum quality | β
β
β
β
β
| Near-lossless, best for production use |
| **Q6_K** | 18GB | 20GB | High-end GPUs with focus on quality | β
β
β
β
β | Excellent quality/size balance |
| **Q5_K_M** | 16GB | 18GB | RTX 3080 Ti/4070 Ti users | β
β
β
β
β | Great balance of quality and performance |
| **Q4_K_M** | 13GB | 16GB | **Most users** - RTX 3060 12GB/3070/4060 | β
β
β
ββ | The sweet spot, minimal quality loss |
| **IQ4_XS** | 12GB | 14GB | Experimental - newer compression method | β
β
β
ββ | Good alternative to Q4_K_M |
| **Q3_K_M** | 11GB | 12GB | 8-12GB GPUs, quality-conscious users | β
β
βββ | Noticeable quality drop for complex code |
### Choosing the Right Quantization
**For coding with vision tasks, I recommend:**
- **Production/Professional use**: Q8_0 or Q6_K
- **General development**: Q4_K_M (best balance)
- **Limited VRAM**: Q5_K_M if you can fit it, otherwise Q4_K_M
- **Experimental**: Try IQ4_XS for potentially better quality at similar size to Q4_K_M
**Avoid Q3_K_M unless you're VRAM-constrained** - the quality degradation becomes noticeable for complex coding tasks and visual understanding.
## Usage Examples
### With llama.cpp
```bash
# Download the model
huggingface-cli download cognitivecomputations/Devstral-Vision-Small-2507-GGUF \
Devstral-Small-Vision-2507-Q4_K_M.gguf \
--local-dir .
huggingface-cli download cognitivecomputations/Devstral-Vision-Small-2507-GGUF \
mmproj-BF16.gguf \
--local-dir .
# Run with llama.cpp
./llama-cli -m Devstral-Small-Vision-2507-Q4_K_M.gguf \
-p "Analyze this UI and generate React code" \
--image screenshot.png \
-c 8192
```
### With LM Studio
1. Download your chosen quantization
2. Load in LM Studio
3. Enable multimodal/vision mode in settings
4. Drag and drop images into the chat
### With ollama
```bash
# Create Modelfile
cat > Modelfile << EOF
FROM ./Devstral-Small-Vision-2507-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
EOF
# Create and run
ollama create devstral-vision -f Modelfile
ollama run devstral-vision
```
### With koboldcpp
```bash
python koboldcpp.py --model Devstral-Small-Vision-2507-Q4_K_M.gguf \
--contextsize 8192 \
--gpulayers 999 \
--multimodal
```
## Performance Tips
1. **Context Size**: This model supports up to 128k context, but start with 8k-16k for better performance
2. **GPU Layers**: Offload all layers to GPU if possible (`--gpulayers 999` or `-ngl 999`)
3. **Batch Size**: Increase batch size for better throughput if you have VRAM headroom
4. **Temperature**: Use lower temperatures (0.1-0.3) for code generation, higher (0.7-0.9) for creative tasks
## Hardware Requirements
| Quantization | Single GPU | Partial Offload | CPU Only |
|-------------|------------|-----------------|----------|
| Q8_0 | 24GB VRAM | 16GB VRAM + 16GB RAM | 32GB RAM |
| Q6_K | 20GB VRAM | 12GB VRAM + 16GB RAM | 24GB RAM |
| Q5_K_M | 18GB VRAM | 12GB VRAM + 12GB RAM | 24GB RAM |
| Q4_K_M | 16GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
| IQ4_XS | 14GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
| Q3_K_M | 12GB VRAM | 6GB VRAM + 12GB RAM | 16GB RAM |
## Model Capabilities
β
**Strengths:**
- Exceptional at converting visual designs to code
- Strong debugging abilities with visual context
- Maintains Devstral's 53.6% SWE-Bench performance
- Handles multiple programming languages
- 128k token context window
β οΈ **Limitations:**
- Not specifically fine-tuned for vision-to-code tasks
- Vision performance bounded by Mistral-Small-3.2's capabilities
- Requires decent hardware for optimal performance
- Quantization impacts both vision and coding quality
## License
Apache 2.0 (inherited from base models)

## Acknowledgments
- Original model by [Eric Hartford](https://erichartford.com/) at [Cognitive Computations](https://cognitivecomputations.ai/)
- Built on [Mistral AI](https://mistral.ai/)'s Devstral and Mistral-Small models
- Quantized using llama.cpp
## Links
- [Original Model](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507)
- [Devstral Base](https://huggingface.co/mistralai/Devstral-Small-2507)
- [Mistral-Small Vision](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506)
---
*For issues or questions about these quantizations, please open an issue in the repository.* |