QuixiAI
/

Devstral-Vision-Small-2507-gguf

GGUF

Model card Files Files and versions

xet

Community

ehartford commited on Jul 11, 2025

Commit

4005952

verified ·

1 Parent(s): 58d3f2c

Create README.md

Browse files

Files changed (1) hide show

README.md +136 -0

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+---
+license: apache-2.0
+---
+# Devstral-Vision-Small-2507 GGUF
+Quantized GGUF versions of [cognitivecomputations/Devstral-Vision-Small-2507](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507) - the multimodal coding specialist that combines Devstral's exceptional coding abilities with vision understanding.
+## Model Description
+This is the first vision-enabled version of Devstral, created by transplanting Devstral's language model weights into Mistral-Small-3.2's multimodal architecture. It enables:
+- Converting UI screenshots to code
+- Debugging visual rendering issues
+- Implementing designs from mockups
+- Understanding codebases with visual context
+## Quantization Selection Guide
+| Quantization | Size | Min RAM | Recommended For | Quality | Notes |
+|-------------|------|---------|-----------------|---------|-------|
+| **Q8_0** | 23GB | 24GB | RTX 3090/4090/A6000 users wanting maximum quality | ★★★★★ | Near-lossless, best for production use |
+| **Q6_K** | 18GB | 20GB | High-end GPUs with focus on quality | ★★★★☆ | Excellent quality/size balance |
+| **Q5_K_M** | 16GB | 18GB | RTX 3080 Ti/4070 Ti users | ★★★★☆ | Great balance of quality and performance |
+| **Q4_K_M** | 13GB | 16GB | **Most users** - RTX 3060 12GB/3070/4060 | ★★★☆☆ | The sweet spot, minimal quality loss |
+| **IQ4_XS** | 12GB | 14GB | Experimental - newer compression method | ★★★☆☆ | Good alternative to Q4_K_M |
+| **Q3_K_M** | 11GB | 12GB | 8-12GB GPUs, quality-conscious users | ★★☆☆☆ | Noticeable quality drop for complex code |
+### Choosing the Right Quantization
+**For coding with vision tasks, I recommend:**
+- **Production/Professional use**: Q8_0 or Q6_K
+- **General development**: Q4_K_M (best balance)
+- **Limited VRAM**: Q5_K_M if you can fit it, otherwise Q4_K_M
+- **Experimental**: Try IQ4_XS for potentially better quality at similar size to Q4_K_M
+**Avoid Q3_K_M unless you're VRAM-constrained** - the quality degradation becomes noticeable for complex coding tasks and visual understanding.
+## Usage Examples
+### With llama.cpp
+```bash
+# Download the model
+huggingface-cli download cognitivecomputations/Devstral-Vision-Small-2507-GGUF \
+  Devstral-Small-Vision-2507-Q4_K_M.gguf \
+  --local-dir .
+# Run with llama.cpp
+./llama-cli -m Devstral-Small-Vision-2507-Q4_K_M.gguf \
+  -p "Analyze this UI and generate React code" \
+  --image screenshot.png \
+  -c 8192
+```
+### With LM Studio
+1. Download your chosen quantization
+2. Load in LM Studio
+3. Enable multimodal/vision mode in settings
+4. Drag and drop images into the chat
+### With ollama
+```bash
+# Create Modelfile
+cat > Modelfile << EOF
+FROM ./Devstral-Small-Vision-2507-Q4_K_M.gguf
+PARAMETER temperature 0.7
+PARAMETER num_ctx 8192
+EOF
+# Create and run
+ollama create devstral-vision -f Modelfile
+ollama run devstral-vision
+```
+### With koboldcpp
+```bash
+python koboldcpp.py --model Devstral-Small-Vision-2507-Q4_K_M.gguf \
+  --contextsize 8192 \
+  --gpulayers 999 \
+  --multimodal
+```
+## Performance Tips
+1. **Context Size**: This model supports up to 128k context, but start with 8k-16k for better performance
+2. **GPU Layers**: Offload all layers to GPU if possible (`--gpulayers 999` or `-ngl 999`)
+3. **Batch Size**: Increase batch size for better throughput if you have VRAM headroom
+4. **Temperature**: Use lower temperatures (0.1-0.3) for code generation, higher (0.7-0.9) for creative tasks
+## Hardware Requirements
+| Quantization | Single GPU | Partial Offload | CPU Only |
+|-------------|------------|-----------------|----------|
+| Q8_0 | 24GB VRAM | 16GB VRAM + 16GB RAM | 32GB RAM |
+| Q6_K | 20GB VRAM | 12GB VRAM + 16GB RAM | 24GB RAM |
+| Q5_K_M | 18GB VRAM | 12GB VRAM + 12GB RAM | 24GB RAM |
+| Q4_K_M | 16GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
+| IQ4_XS | 14GB VRAM | 8GB VRAM + 12GB RAM | 20GB RAM |
+| Q3_K_M | 12GB VRAM | 6GB VRAM + 12GB RAM | 16GB RAM |
+## Model Capabilities
+✅ **Strengths:**
+- Exceptional at converting visual designs to code
+- Strong debugging abilities with visual context
+- Maintains Devstral's 53.6% SWE-Bench performance
+- Handles multiple programming languages
+- 128k token context window
+⚠️ **Limitations:**
+- Not specifically fine-tuned for vision-to-code tasks
+- Vision performance bounded by Mistral-Small-3.2's capabilities
+- Requires decent hardware for optimal performance
+- Quantization impacts both vision and coding quality
+## License
+Apache 2.0 (inherited from base models)
+## Acknowledgments
+- Original model by [Eric Hartford](https://erichartford.com/) at [Cognitive Computations](https://cognitivecomputations.ai/)
+- Built on [Mistral AI](https://mistral.ai/)'s Devstral and Mistral-Small models
+- Quantized using llama.cpp
+## Links
+- [Original Model](https://huggingface.co/cognitivecomputations/Devstral-Vision-Small-2507)
+- [Devstral Base](https://huggingface.co/mistralai/Devstral-Small-2507)
+- [Mistral-Small Vision](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506)
+---
+*For issues or questions about these quantizations, please open an issue in the repository.*