DeepSeek-OCR MBQ Quantized Model (Standalone)
This is a fully standalone quantized version of deepseek-ai/DeepSeek-OCR using MBQ (Mixed-precision post-training quantization).
✨ No need to download the original model - all architecture files included!
Model Details
- Base Model: deepseek-ai/DeepSeek-OCR
- Quantization Method: MBQ (Mixed-precision Quantization)
- Weight Precision: 4-bit (mixed with 8-bit for sensitive layers)
- Activation Precision: 8-bit
- Format: SafeTensors (int8 quantized with scales)
- Standalone: All architecture files included ✅
Quantization Statistics
| Metric | Value |
|---|---|
| Original Size | 6,672 MB (6.67 GB) |
| Quantized Size | 3,510 MB (3.51 GB) |
| Size Reduction | 3,162 MB (47.4%) |
| Compression Ratio | 1.90x |
Quick Start (Standalone - No Original Model Needed!)
Installation
pip install torch transformers safetensors accelerate pillow
Simple Loading (Recommended)
import torch
from transformers import AutoTokenizer, AutoModel
# Device setup
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and tokenizer directly - all files included!
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True
)
model = AutoModel.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Load the quantized weights using the helper
from load_mbq_model import load_mbq_model
state_dict = load_mbq_model("./") # Assumes files are in current directory
model.load_state_dict(state_dict)
model = model.to(device).eval()
print("✅ Model loaded successfully!")
Manual Loading with Dequantization
import torch
from transformers import AutoTokenizer, AutoModel
from safetensors.torch import load_file
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True
)
# Load quantized weights
state_dict = load_file("model.safetensors")
# Separate weights and scales
weights = {}
scales = {}
for name, param in state_dict.items():
if '.scale' in name:
scales[name.replace('.scale', '')] = param
else:
weights[name] = param
# Dequantize weights
dequantized_state_dict = {}
for name, param in weights.items():
if name in scales:
scale = scales[name]
dequantized = (param.float() * scale).to(torch.bfloat16)
dequantized_state_dict[name] = dequantized
else:
dequantized_state_dict[name] = param
# Load model architecture (included in this repo!)
model = AutoModel.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Load the quantized weights
model.load_state_dict(dequantized_state_dict)
model = model.to(device).eval()
print("✅ Model loaded successfully!")
Model Files
Core Files
- model.safetensors (3.51 GB): Quantized model weights (int8 + scales)
- load_mbq_model.py: Helper script for loading
Architecture Files (from original model)
- modeling_deepseekocr.py: Main model architecture
- modeling_deepseekv2.py: DeepSeek V2 backbone
- configuration_deepseek_v2.py: Model configuration
- deepencoder.py: Vision encoder
- conversation.py: Conversation utilities
- processor_config.json: Processor configuration
Tokenizer & Config
- tokenizer.json: Tokenizer vocabulary
- tokenizer_config.json: Tokenizer configuration
- config.json: Model configuration
- special_tokens_map.json: Special tokens
Metadata
- quantization_metadata.json: Quantization details
- quantization_report.json: Compression statistics
Advantages
✅ Standalone: All files included, no need to download original model
✅ Smaller Size: 47% reduction in model size
✅ Easy Loading: Simple AutoModel.from_pretrained() with trust_remote_code=True
✅ Compatible: Works with standard transformers library
✅ Preserved Quality: Mixed-precision maintains model performance
MBQ Methodology
MBQ (Mixed-precision post-training quantization) intelligently allocates different bit-widths to layers based on their sensitivity:
- Sensitivity Analysis: Computes sensitivity scores using Hessian approximation
- Mixed Precision: High-sensitivity layers (top 15%) → 8-bit, others → 4-bit
- Symmetric Quantization: Efficient quantization scheme for weights and activations
- Storage: Weights stored as int8 with separate scale factors for true compression
Performance
- Memory Usage: Reduced by 47.4%
- Model Size: From 6.67 GB to 3.51 GB
- Standalone: No dependency on original model repo ✅
- Inference: Lower memory footprint, faster loading
Citation
If you use this quantized model, please cite:
@misc{deepseek-ocr-mbq,
author = {SamMikaelson},
title = {DeepSeek-OCR MBQ Quantized Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/SamMikaelson/deepseek-ocr-mbq-w4bit}}
}
Original model:
@misc{deepseek-ocr,
title={DeepSeek-OCR},
author={DeepSeek-AI},
year={2024},
howpublished={\url{https://huggingface.co/deepseek-ai/DeepSeek-OCR}}
}
License
MIT License (same as the base model)
Troubleshooting
If you encounter issues loading the model:
- Ensure
trust_remote_code=Trueis set - Install required packages:
pip install -r requirements.txt - Check that you're using transformers >= 4.40.0
- Use the provided
load_mbq_model.pyhelper script
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 64
Model tree for SamMikaelson/deepseek-ocr-mbq-w4bit
Base model
deepseek-ai/DeepSeek-OCR