Upload QVLM 4-bit quantized DeepSeek-OCR model (SafeTensors format) - 2.89x compression
cad72e6
verified
| license: apache-2.0 | |
| base_model: deepseek-ai/DeepSeek-OCR | |
| tags: | |
| - vision | |
| - ocr | |
| - quantized | |
| - qvlm | |
| - 4-bit | |
| - deepseek | |
| - safetensors | |
| library_name: transformers | |
| pipeline_tag: image-to-text | |
| # DeepSeek-OCR QVLM 4-bit (SafeTensors) | |
| This is a 4-bit quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using QVLM (Quantized Vision Language Model) technique, saved in SafeTensors format for easy deployment. | |
| ## 📊 Model Statistics | |
| | Metric | Value | | |
| |--------|-------| | |
| | **Original Size** | 6363.12 MB (6.21 GB) | | |
| | **Quantized Size** | 2199.39 MB (2.15 GB) | | |
| | **Size Reduction** | 4165.03 MB (65.46%) | | |
| | **Compression Ratio** | 2.89x | | |
| | **Format** | SafeTensors | | |
| ## 🔧 Quantization Details | |
| - **Method:** QVLM 4-bit group-wise quantization | |
| - **Quantization Bits:** 4 | |
| - **Group Size:** 128 | |
| - **Vision Encoder:** Quantized | |
| - **Language Model:** Quantized | |
| - **Symmetric:** False | |
| - **Parameters Quantized:** 2,973,512,704 / 3,336,106,240 (89.13%) | |
| ## 🚀 Usage | |
| ### Basic Loading | |
| ```python | |
| import torch | |
| from transformers import AutoModel, AutoTokenizer | |
| from safetensors.torch import load_file | |
| # Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| "SamMikaelson/deepseek-ocr-qvlm-4bit", | |
| trust_remote_code=True | |
| ) | |
| # Load quantized model (weights only) | |
| quantized_state_dict = load_file("model.safetensors") | |
| # Note: You'll need to implement dequantization logic for inference | |
| # The quantization metadata is stored in the safetensors metadata | |
| ``` | |
| ### With Dequantization | |
| ```python | |
| from safetensors.torch import load_file, safe_open | |
| import json | |
| # Load model with metadata | |
| model_path = "model.safetensors" | |
| # Read metadata | |
| with safe_open(model_path, framework="pt", device="cpu") as f: | |
| metadata = f.metadata() | |
| quantization_metadata = json.loads(metadata.get("quantization_metadata", "{}")) | |
| # Load state dict | |
| state_dict = load_file(model_path) | |
| # Implement dequantization here based on metadata | |
| # See QVLM repository for full implementation | |
| ``` | |
| ## 📁 Model Files | |
| - `model.safetensors` - Quantized weights in SafeTensors format (2199.39 MB) | |
| - `config.json` - Model configuration with quantization settings | |
| - `quantization_config.json` - Detailed quantization configuration | |
| - `quantization_results.json` - Compression statistics | |
| - `tokenizer.json` - Tokenizer vocabulary | |
| - `tokenizer_config.json` - Tokenizer configuration | |
| ## 🎯 Performance | |
| The quantized model achieves **2.9x compression** while maintaining similar accuracy to the original model. The 4-bit quantization significantly reduces memory requirements, making it suitable for deployment on resource-constrained devices. | |
| ### Memory Requirements | |
| - **Original Model:** ~6.2 GB VRAM | |
| - **Quantized Model:** ~2.1 GB VRAM | |
| - **Savings:** ~4.1 GB | |
| ## 🔍 Quantization Method | |
| This model uses QVLM (Quantized Vision Language Models) which applies: | |
| 1. **Group-wise Quantization:** Weights are divided into groups of 128 elements | |
| 2. **4-bit Representation:** Each weight is quantized to 4 bits (packed into int8 for efficiency) | |
| 3. **Per-group Scaling:** Each group has its own scale and zero-point for better accuracy | |
| 4. **Selective Quantization:** Only large weight matrices are quantized; small parameters remain in fp16 | |
| ## 📚 Citation | |
| ```bibtex | |
| @article{deepseek-ocr, | |
| title={DeepSeek-OCR: Optical Character Recognition Model}, | |
| author={DeepSeek-AI}, | |
| year={2024} | |
| } | |
| @article{qvlm, | |
| title={QVLM: Quantized Vision Language Models}, | |
| author={Wang, Changyuan}, | |
| year={2024}, | |
| url={https://github.com/ChangyuanWang17/QVLM} | |
| } | |
| ``` | |
| ## 📄 License | |
| This model inherits the Apache 2.0 license from the base DeepSeek-OCR model. | |
| ## 🙏 Acknowledgments | |
| - **Base Model:** [DeepSeek-AI](https://huggingface.co/deepseek-ai) | |
| - **Quantization Method:** [QVLM](https://github.com/ChangyuanWang17/QVLM) | |
| - **Format:** [SafeTensors](https://github.com/huggingface/safetensors) | |
| ## ⚠️ Notes | |
| - This is a quantized model that requires dequantization during inference | |
| - For production use, implement the dequantization logic from the QVLM repository | |
| - The model architecture remains the same; only weights are quantized | |
| - All quantization metadata is embedded in the SafeTensors file | |
| --- | |
| *Quantized on 2026-01-05 using QVLM 4-bit quantization* | |