File size: 6,076 Bytes
04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 04e51cf 8b71211 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | ---
language:
- en
- zh
license: mit
library_name: transformers
tags:
- ocr
- quantization
- mbq
- deepseek
- vision-language
- standalone
base_model: deepseek-ai/DeepSeek-OCR
---
# DeepSeek-OCR MBQ Quantized Model (Standalone)
This is a **fully standalone** quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using **MBQ (Mixed-precision post-training quantization)**.
✨ **No need to download the original model** - all architecture files included!
## Model Details
- **Base Model**: deepseek-ai/DeepSeek-OCR
- **Quantization Method**: MBQ (Mixed-precision Quantization)
- **Weight Precision**: 4-bit (mixed with 8-bit for sensitive layers)
- **Activation Precision**: 8-bit
- **Format**: SafeTensors (int8 quantized with scales)
- **Standalone**: All architecture files included ✅
## Quantization Statistics
| Metric | Value |
|--------|-------|
| Original Size | 6,672 MB (6.67 GB) |
| **Quantized Size** | **3,510 MB (3.51 GB)** |
| **Size Reduction** | **3,162 MB (47.4%)** |
| **Compression Ratio** | **1.90x** |
## Quick Start (Standalone - No Original Model Needed!)
### Installation
```bash
pip install torch transformers safetensors accelerate pillow
```
### Simple Loading (Recommended)
```python
import torch
from transformers import AutoTokenizer, AutoModel
# Device setup
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and tokenizer directly - all files included!
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True
)
model = AutoModel.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Load the quantized weights using the helper
from load_mbq_model import load_mbq_model
state_dict = load_mbq_model("./") # Assumes files are in current directory
model.load_state_dict(state_dict)
model = model.to(device).eval()
print("✅ Model loaded successfully!")
```
### Manual Loading with Dequantization
```python
import torch
from transformers import AutoTokenizer, AutoModel
from safetensors.torch import load_file
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True
)
# Load quantized weights
state_dict = load_file("model.safetensors")
# Separate weights and scales
weights = {}
scales = {}
for name, param in state_dict.items():
if '.scale' in name:
scales[name.replace('.scale', '')] = param
else:
weights[name] = param
# Dequantize weights
dequantized_state_dict = {}
for name, param in weights.items():
if name in scales:
scale = scales[name]
dequantized = (param.float() * scale).to(torch.bfloat16)
dequantized_state_dict[name] = dequantized
else:
dequantized_state_dict[name] = param
# Load model architecture (included in this repo!)
model = AutoModel.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Load the quantized weights
model.load_state_dict(dequantized_state_dict)
model = model.to(device).eval()
print("✅ Model loaded successfully!")
```
## Model Files
### Core Files
- **model.safetensors** (3.51 GB): Quantized model weights (int8 + scales)
- **load_mbq_model.py**: Helper script for loading
### Architecture Files (from original model)
- **modeling_deepseekocr.py**: Main model architecture
- **modeling_deepseekv2.py**: DeepSeek V2 backbone
- **configuration_deepseek_v2.py**: Model configuration
- **deepencoder.py**: Vision encoder
- **conversation.py**: Conversation utilities
- **processor_config.json**: Processor configuration
### Tokenizer & Config
- **tokenizer.json**: Tokenizer vocabulary
- **tokenizer_config.json**: Tokenizer configuration
- **config.json**: Model configuration
- **special_tokens_map.json**: Special tokens
### Metadata
- **quantization_metadata.json**: Quantization details
- **quantization_report.json**: Compression statistics
## Advantages
✅ **Standalone**: All files included, no need to download original model
✅ **Smaller Size**: 47% reduction in model size
✅ **Easy Loading**: Simple AutoModel.from_pretrained() with trust_remote_code=True
✅ **Compatible**: Works with standard transformers library
✅ **Preserved Quality**: Mixed-precision maintains model performance
## MBQ Methodology
MBQ (Mixed-precision post-training quantization) intelligently allocates different bit-widths to layers based on their sensitivity:
1. **Sensitivity Analysis**: Computes sensitivity scores using Hessian approximation
2. **Mixed Precision**: High-sensitivity layers (top 15%) → 8-bit, others → 4-bit
3. **Symmetric Quantization**: Efficient quantization scheme for weights and activations
4. **Storage**: Weights stored as int8 with separate scale factors for true compression
## Performance
- **Memory Usage**: Reduced by 47.4%
- **Model Size**: From 6.67 GB to 3.51 GB
- **Standalone**: No dependency on original model repo ✅
- **Inference**: Lower memory footprint, faster loading
## Citation
If you use this quantized model, please cite:
```bibtex
@misc{deepseek-ocr-mbq,
author = {SamMikaelson},
title = {DeepSeek-OCR MBQ Quantized Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/SamMikaelson/deepseek-ocr-mbq-w4bit}}
}
```
Original model:
```bibtex
@misc{deepseek-ocr,
title={DeepSeek-OCR},
author={DeepSeek-AI},
year={2024},
howpublished={\url{https://huggingface.co/deepseek-ai/DeepSeek-OCR}}
}
```
## License
MIT License (same as the base model)
## Troubleshooting
If you encounter issues loading the model:
1. Ensure `trust_remote_code=True` is set
2. Install required packages: `pip install -r requirements.txt`
3. Check that you're using transformers >= 4.40.0
4. Use the provided `load_mbq_model.py` helper script
For questions or issues, please open an issue on the model repository.
|