DeepSeek-OCR QVLM 4-bit (SafeTensors)
This is a 4-bit quantized version of deepseek-ai/DeepSeek-OCR using QVLM (Quantized Vision Language Model) technique, saved in SafeTensors format for easy deployment.
📊 Model Statistics
| Metric | Value |
|---|---|
| Original Size | 6363.12 MB (6.21 GB) |
| Quantized Size | 2199.39 MB (2.15 GB) |
| Size Reduction | 4165.03 MB (65.46%) |
| Compression Ratio | 2.89x |
| Format | SafeTensors |
🔧 Quantization Details
- Method: QVLM 4-bit group-wise quantization
- Quantization Bits: 4
- Group Size: 128
- Vision Encoder: Quantized
- Language Model: Quantized
- Symmetric: False
- Parameters Quantized: 2,973,512,704 / 3,336,106,240 (89.13%)
🚀 Usage
Basic Loading
import torch
from transformers import AutoModel, AutoTokenizer
from safetensors.torch import load_file
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-qvlm-4bit",
trust_remote_code=True
)
# Load quantized model (weights only)
quantized_state_dict = load_file("model.safetensors")
# Note: You'll need to implement dequantization logic for inference
# The quantization metadata is stored in the safetensors metadata
With Dequantization
from safetensors.torch import load_file, safe_open
import json
# Load model with metadata
model_path = "model.safetensors"
# Read metadata
with safe_open(model_path, framework="pt", device="cpu") as f:
metadata = f.metadata()
quantization_metadata = json.loads(metadata.get("quantization_metadata", "{}"))
# Load state dict
state_dict = load_file(model_path)
# Implement dequantization here based on metadata
# See QVLM repository for full implementation
📁 Model Files
model.safetensors- Quantized weights in SafeTensors format (2199.39 MB)config.json- Model configuration with quantization settingsquantization_config.json- Detailed quantization configurationquantization_results.json- Compression statisticstokenizer.json- Tokenizer vocabularytokenizer_config.json- Tokenizer configuration
🎯 Performance
The quantized model achieves 2.9x compression while maintaining similar accuracy to the original model. The 4-bit quantization significantly reduces memory requirements, making it suitable for deployment on resource-constrained devices.
Memory Requirements
- Original Model: ~6.2 GB VRAM
- Quantized Model: ~2.1 GB VRAM
- Savings: ~4.1 GB
🔍 Quantization Method
This model uses QVLM (Quantized Vision Language Models) which applies:
- Group-wise Quantization: Weights are divided into groups of 128 elements
- 4-bit Representation: Each weight is quantized to 4 bits (packed into int8 for efficiency)
- Per-group Scaling: Each group has its own scale and zero-point for better accuracy
- Selective Quantization: Only large weight matrices are quantized; small parameters remain in fp16
📚 Citation
@article{deepseek-ocr,
title={DeepSeek-OCR: Optical Character Recognition Model},
author={DeepSeek-AI},
year={2024}
}
@article{qvlm,
title={QVLM: Quantized Vision Language Models},
author={Wang, Changyuan},
year={2024},
url={https://github.com/ChangyuanWang17/QVLM}
}
📄 License
This model inherits the Apache 2.0 license from the base DeepSeek-OCR model.
🙏 Acknowledgments
- Base Model: DeepSeek-AI
- Quantization Method: QVLM
- Format: SafeTensors
⚠️ Notes
- This is a quantized model that requires dequantization during inference
- For production use, implement the dequantization logic from the QVLM repository
- The model architecture remains the same; only weights are quantized
- All quantization metadata is embedded in the SafeTensors file
Quantized on 2026-01-05 using QVLM 4-bit quantization
- Downloads last month
- 76
Model tree for SamMikaelson/deepseek-ocr-qvlm-4bit
Base model
deepseek-ai/DeepSeek-OCR