--- license: apache-2.0 base_model: deepseek-ai/DeepSeek-OCR tags: - vision - ocr - quantized - qvlm - 4-bit - deepseek - safetensors library_name: transformers pipeline_tag: image-to-text --- # DeepSeek-OCR QVLM 4-bit (SafeTensors) This is a 4-bit quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using QVLM (Quantized Vision Language Model) technique, saved in SafeTensors format for easy deployment. ## 📊 Model Statistics | Metric | Value | |--------|-------| | **Original Size** | 6363.12 MB (6.21 GB) | | **Quantized Size** | 2199.39 MB (2.15 GB) | | **Size Reduction** | 4165.03 MB (65.46%) | | **Compression Ratio** | 2.89x | | **Format** | SafeTensors | ## 🔧 Quantization Details - **Method:** QVLM 4-bit group-wise quantization - **Quantization Bits:** 4 - **Group Size:** 128 - **Vision Encoder:** Quantized - **Language Model:** Quantized - **Symmetric:** False - **Parameters Quantized:** 2,973,512,704 / 3,336,106,240 (89.13%) ## 🚀 Usage ### Basic Loading ```python import torch from transformers import AutoModel, AutoTokenizer from safetensors.torch import load_file # Load tokenizer tokenizer = AutoTokenizer.from_pretrained( "SamMikaelson/deepseek-ocr-qvlm-4bit", trust_remote_code=True ) # Load quantized model (weights only) quantized_state_dict = load_file("model.safetensors") # Note: You'll need to implement dequantization logic for inference # The quantization metadata is stored in the safetensors metadata ``` ### With Dequantization ```python from safetensors.torch import load_file, safe_open import json # Load model with metadata model_path = "model.safetensors" # Read metadata with safe_open(model_path, framework="pt", device="cpu") as f: metadata = f.metadata() quantization_metadata = json.loads(metadata.get("quantization_metadata", "{}")) # Load state dict state_dict = load_file(model_path) # Implement dequantization here based on metadata # See QVLM repository for full implementation ``` ## 📁 Model Files - `model.safetensors` - Quantized weights in SafeTensors format (2199.39 MB) - `config.json` - Model configuration with quantization settings - `quantization_config.json` - Detailed quantization configuration - `quantization_results.json` - Compression statistics - `tokenizer.json` - Tokenizer vocabulary - `tokenizer_config.json` - Tokenizer configuration ## 🎯 Performance The quantized model achieves **2.9x compression** while maintaining similar accuracy to the original model. The 4-bit quantization significantly reduces memory requirements, making it suitable for deployment on resource-constrained devices. ### Memory Requirements - **Original Model:** ~6.2 GB VRAM - **Quantized Model:** ~2.1 GB VRAM - **Savings:** ~4.1 GB ## 🔍 Quantization Method This model uses QVLM (Quantized Vision Language Models) which applies: 1. **Group-wise Quantization:** Weights are divided into groups of 128 elements 2. **4-bit Representation:** Each weight is quantized to 4 bits (packed into int8 for efficiency) 3. **Per-group Scaling:** Each group has its own scale and zero-point for better accuracy 4. **Selective Quantization:** Only large weight matrices are quantized; small parameters remain in fp16 ## 📚 Citation ```bibtex @article{deepseek-ocr, title={DeepSeek-OCR: Optical Character Recognition Model}, author={DeepSeek-AI}, year={2024} } @article{qvlm, title={QVLM: Quantized Vision Language Models}, author={Wang, Changyuan}, year={2024}, url={https://github.com/ChangyuanWang17/QVLM} } ``` ## 📄 License This model inherits the Apache 2.0 license from the base DeepSeek-OCR model. ## 🙏 Acknowledgments - **Base Model:** [DeepSeek-AI](https://huggingface.co/deepseek-ai) - **Quantization Method:** [QVLM](https://github.com/ChangyuanWang17/QVLM) - **Format:** [SafeTensors](https://github.com/huggingface/safetensors) ## ⚠️ Notes - This is a quantized model that requires dequantization during inference - For production use, implement the dequantization logic from the QVLM repository - The model architecture remains the same; only weights are quantized - All quantization metadata is embedded in the SafeTensors file --- *Quantized on 2026-01-05 using QVLM 4-bit quantization*