SamMikaelson's picture
Add README.md
8b71211 verified
---
language:
- en
- zh
license: mit
library_name: transformers
tags:
- ocr
- quantization
- mbq
- deepseek
- vision-language
- standalone
base_model: deepseek-ai/DeepSeek-OCR
---
# DeepSeek-OCR MBQ Quantized Model (Standalone)
This is a **fully standalone** quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using **MBQ (Mixed-precision post-training quantization)**.
**No need to download the original model** - all architecture files included!
## Model Details
- **Base Model**: deepseek-ai/DeepSeek-OCR
- **Quantization Method**: MBQ (Mixed-precision Quantization)
- **Weight Precision**: 4-bit (mixed with 8-bit for sensitive layers)
- **Activation Precision**: 8-bit
- **Format**: SafeTensors (int8 quantized with scales)
- **Standalone**: All architecture files included ✅
## Quantization Statistics
| Metric | Value |
|--------|-------|
| Original Size | 6,672 MB (6.67 GB) |
| **Quantized Size** | **3,510 MB (3.51 GB)** |
| **Size Reduction** | **3,162 MB (47.4%)** |
| **Compression Ratio** | **1.90x** |
## Quick Start (Standalone - No Original Model Needed!)
### Installation
```bash
pip install torch transformers safetensors accelerate pillow
```
### Simple Loading (Recommended)
```python
import torch
from transformers import AutoTokenizer, AutoModel
# Device setup
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and tokenizer directly - all files included!
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True
)
model = AutoModel.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Load the quantized weights using the helper
from load_mbq_model import load_mbq_model
state_dict = load_mbq_model("./") # Assumes files are in current directory
model.load_state_dict(state_dict)
model = model.to(device).eval()
print("✅ Model loaded successfully!")
```
### Manual Loading with Dequantization
```python
import torch
from transformers import AutoTokenizer, AutoModel
from safetensors.torch import load_file
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True
)
# Load quantized weights
state_dict = load_file("model.safetensors")
# Separate weights and scales
weights = {}
scales = {}
for name, param in state_dict.items():
if '.scale' in name:
scales[name.replace('.scale', '')] = param
else:
weights[name] = param
# Dequantize weights
dequantized_state_dict = {}
for name, param in weights.items():
if name in scales:
scale = scales[name]
dequantized = (param.float() * scale).to(torch.bfloat16)
dequantized_state_dict[name] = dequantized
else:
dequantized_state_dict[name] = param
# Load model architecture (included in this repo!)
model = AutoModel.from_pretrained(
"SamMikaelson/deepseek-ocr-mbq-w4bit",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Load the quantized weights
model.load_state_dict(dequantized_state_dict)
model = model.to(device).eval()
print("✅ Model loaded successfully!")
```
## Model Files
### Core Files
- **model.safetensors** (3.51 GB): Quantized model weights (int8 + scales)
- **load_mbq_model.py**: Helper script for loading
### Architecture Files (from original model)
- **modeling_deepseekocr.py**: Main model architecture
- **modeling_deepseekv2.py**: DeepSeek V2 backbone
- **configuration_deepseek_v2.py**: Model configuration
- **deepencoder.py**: Vision encoder
- **conversation.py**: Conversation utilities
- **processor_config.json**: Processor configuration
### Tokenizer & Config
- **tokenizer.json**: Tokenizer vocabulary
- **tokenizer_config.json**: Tokenizer configuration
- **config.json**: Model configuration
- **special_tokens_map.json**: Special tokens
### Metadata
- **quantization_metadata.json**: Quantization details
- **quantization_report.json**: Compression statistics
## Advantages
**Standalone**: All files included, no need to download original model
**Smaller Size**: 47% reduction in model size
**Easy Loading**: Simple AutoModel.from_pretrained() with trust_remote_code=True
**Compatible**: Works with standard transformers library
**Preserved Quality**: Mixed-precision maintains model performance
## MBQ Methodology
MBQ (Mixed-precision post-training quantization) intelligently allocates different bit-widths to layers based on their sensitivity:
1. **Sensitivity Analysis**: Computes sensitivity scores using Hessian approximation
2. **Mixed Precision**: High-sensitivity layers (top 15%) → 8-bit, others → 4-bit
3. **Symmetric Quantization**: Efficient quantization scheme for weights and activations
4. **Storage**: Weights stored as int8 with separate scale factors for true compression
## Performance
- **Memory Usage**: Reduced by 47.4%
- **Model Size**: From 6.67 GB to 3.51 GB
- **Standalone**: No dependency on original model repo ✅
- **Inference**: Lower memory footprint, faster loading
## Citation
If you use this quantized model, please cite:
```bibtex
@misc{deepseek-ocr-mbq,
author = {SamMikaelson},
title = {DeepSeek-OCR MBQ Quantized Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/SamMikaelson/deepseek-ocr-mbq-w4bit}}
}
```
Original model:
```bibtex
@misc{deepseek-ocr,
title={DeepSeek-OCR},
author={DeepSeek-AI},
year={2024},
howpublished={\url{https://huggingface.co/deepseek-ai/DeepSeek-OCR}}
}
```
## License
MIT License (same as the base model)
## Troubleshooting
If you encounter issues loading the model:
1. Ensure `trust_remote_code=True` is set
2. Install required packages: `pip install -r requirements.txt`
3. Check that you're using transformers >= 4.40.0
4. Use the provided `load_mbq_model.py` helper script
For questions or issues, please open an issue on the model repository.