File size: 4,284 Bytes
cad72e6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ---
license: apache-2.0
base_model: deepseek-ai/DeepSeek-OCR
tags:
- vision
- ocr
- quantized
- qvlm
- 4-bit
- deepseek
- safetensors
library_name: transformers
pipeline_tag: image-to-text
---
# DeepSeek-OCR QVLM 4-bit (SafeTensors)
This is a 4-bit quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using QVLM (Quantized Vision Language Model) technique, saved in SafeTensors format for easy deployment.
## 📊 Model Statistics
| Metric | Value |
|--------|-------|
| **Original Size** | 6363.12 MB (6.21 GB) |
| **Quantized Size** | 2199.39 MB (2.15 GB) |
| **Size Reduction** | 4165.03 MB (65.46%) |
| **Compression Ratio** | 2.89x |
| **Format** | SafeTensors |
## 🔧 Quantization Details
- **Method:** QVLM 4-bit group-wise quantization
- **Quantization Bits:** 4
- **Group Size:** 128
- **Vision Encoder:** Quantized
- **Language Model:** Quantized
- **Symmetric:** False
- **Parameters Quantized:** 2,973,512,704 / 3,336,106,240 (89.13%)
## 🚀 Usage
### Basic Loading
```python
import torch
from transformers import AutoModel, AutoTokenizer
from safetensors.torch import load_file
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-qvlm-4bit",
trust_remote_code=True
)
# Load quantized model (weights only)
quantized_state_dict = load_file("model.safetensors")
# Note: You'll need to implement dequantization logic for inference
# The quantization metadata is stored in the safetensors metadata
```
### With Dequantization
```python
from safetensors.torch import load_file, safe_open
import json
# Load model with metadata
model_path = "model.safetensors"
# Read metadata
with safe_open(model_path, framework="pt", device="cpu") as f:
metadata = f.metadata()
quantization_metadata = json.loads(metadata.get("quantization_metadata", "{}"))
# Load state dict
state_dict = load_file(model_path)
# Implement dequantization here based on metadata
# See QVLM repository for full implementation
```
## 📁 Model Files
- `model.safetensors` - Quantized weights in SafeTensors format (2199.39 MB)
- `config.json` - Model configuration with quantization settings
- `quantization_config.json` - Detailed quantization configuration
- `quantization_results.json` - Compression statistics
- `tokenizer.json` - Tokenizer vocabulary
- `tokenizer_config.json` - Tokenizer configuration
## 🎯 Performance
The quantized model achieves **2.9x compression** while maintaining similar accuracy to the original model. The 4-bit quantization significantly reduces memory requirements, making it suitable for deployment on resource-constrained devices.
### Memory Requirements
- **Original Model:** ~6.2 GB VRAM
- **Quantized Model:** ~2.1 GB VRAM
- **Savings:** ~4.1 GB
## 🔍 Quantization Method
This model uses QVLM (Quantized Vision Language Models) which applies:
1. **Group-wise Quantization:** Weights are divided into groups of 128 elements
2. **4-bit Representation:** Each weight is quantized to 4 bits (packed into int8 for efficiency)
3. **Per-group Scaling:** Each group has its own scale and zero-point for better accuracy
4. **Selective Quantization:** Only large weight matrices are quantized; small parameters remain in fp16
## 📚 Citation
```bibtex
@article{deepseek-ocr,
title={DeepSeek-OCR: Optical Character Recognition Model},
author={DeepSeek-AI},
year={2024}
}
@article{qvlm,
title={QVLM: Quantized Vision Language Models},
author={Wang, Changyuan},
year={2024},
url={https://github.com/ChangyuanWang17/QVLM}
}
```
## 📄 License
This model inherits the Apache 2.0 license from the base DeepSeek-OCR model.
## 🙏 Acknowledgments
- **Base Model:** [DeepSeek-AI](https://huggingface.co/deepseek-ai)
- **Quantization Method:** [QVLM](https://github.com/ChangyuanWang17/QVLM)
- **Format:** [SafeTensors](https://github.com/huggingface/safetensors)
## ⚠️ Notes
- This is a quantized model that requires dequantization during inference
- For production use, implement the dequantization logic from the QVLM repository
- The model architecture remains the same; only weights are quantized
- All quantization metadata is embedded in the SafeTensors file
---
*Quantized on 2026-01-05 using QVLM 4-bit quantization*
|