File size: 6,076 Bytes

---
language:
- en
- zh
license: mit
library_name: transformers
tags:
- ocr
- quantization
- mbq
- deepseek
- vision-language
- standalone
base_model: deepseek-ai/DeepSeek-OCR
---

# DeepSeek-OCR MBQ Quantized Model (Standalone)

This is a **fully standalone** quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using **MBQ (Mixed-precision post-training quantization)**.

✨ **No need to download the original model** - all architecture files included!

## Model Details

- **Base Model**: deepseek-ai/DeepSeek-OCR
- **Quantization Method**: MBQ (Mixed-precision Quantization)
- **Weight Precision**: 4-bit (mixed with 8-bit for sensitive layers)
- **Activation Precision**: 8-bit
- **Format**: SafeTensors (int8 quantized with scales)
- **Standalone**: All architecture files included ✅

## Quantization Statistics

| Metric | Value |
|--------|-------|
| Original Size | 6,672 MB (6.67 GB) |
| **Quantized Size** | **3,510 MB (3.51 GB)** |
| **Size Reduction** | **3,162 MB (47.4%)** |
| **Compression Ratio** | **1.90x** |

## Quick Start (Standalone - No Original Model Needed!)

### Installation

```bash
pip install torch transformers safetensors accelerate pillow
```

### Simple Loading (Recommended)

```python
import torch
from transformers import AutoTokenizer, AutoModel

# Device setup
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model and tokenizer directly - all files included!
tokenizer = AutoTokenizer.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True
)

model = AutoModel.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Load the quantized weights using the helper
from load_mbq_model import load_mbq_model
state_dict = load_mbq_model("./")  # Assumes files are in current directory

model.load_state_dict(state_dict)
model = model.to(device).eval()

print("✅ Model loaded successfully!")
```

### Manual Loading with Dequantization

```python
import torch
from transformers import AutoTokenizer, AutoModel
from safetensors.torch import load_file

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True
)

# Load quantized weights
state_dict = load_file("model.safetensors")

# Separate weights and scales
weights = {}
scales = {}

for name, param in state_dict.items():
    if '.scale' in name:
        scales[name.replace('.scale', '')] = param
    else:
        weights[name] = param

# Dequantize weights
dequantized_state_dict = {}
for name, param in weights.items():
    if name in scales:
        scale = scales[name]
        dequantized = (param.float() * scale).to(torch.bfloat16)
        dequantized_state_dict[name] = dequantized
    else:
        dequantized_state_dict[name] = param

# Load model architecture (included in this repo!)
model = AutoModel.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Load the quantized weights
model.load_state_dict(dequantized_state_dict)
model = model.to(device).eval()

print("✅ Model loaded successfully!")
```

## Model Files

### Core Files
- **model.safetensors** (3.51 GB): Quantized model weights (int8 + scales)
- **load_mbq_model.py**: Helper script for loading

### Architecture Files (from original model)
- **modeling_deepseekocr.py**: Main model architecture
- **modeling_deepseekv2.py**: DeepSeek V2 backbone
- **configuration_deepseek_v2.py**: Model configuration
- **deepencoder.py**: Vision encoder
- **conversation.py**: Conversation utilities
- **processor_config.json**: Processor configuration

### Tokenizer & Config
- **tokenizer.json**: Tokenizer vocabulary
- **tokenizer_config.json**: Tokenizer configuration
- **config.json**: Model configuration
- **special_tokens_map.json**: Special tokens

### Metadata
- **quantization_metadata.json**: Quantization details
- **quantization_report.json**: Compression statistics

## Advantages

✅ **Standalone**: All files included, no need to download original model  
✅ **Smaller Size**: 47% reduction in model size  
✅ **Easy Loading**: Simple AutoModel.from_pretrained() with trust_remote_code=True  
✅ **Compatible**: Works with standard transformers library  
✅ **Preserved Quality**: Mixed-precision maintains model performance  

## MBQ Methodology

MBQ (Mixed-precision post-training quantization) intelligently allocates different bit-widths to layers based on their sensitivity:

1. **Sensitivity Analysis**: Computes sensitivity scores using Hessian approximation
2. **Mixed Precision**: High-sensitivity layers (top 15%) → 8-bit, others → 4-bit
3. **Symmetric Quantization**: Efficient quantization scheme for weights and activations
4. **Storage**: Weights stored as int8 with separate scale factors for true compression

## Performance

- **Memory Usage**: Reduced by 47.4%
- **Model Size**: From 6.67 GB to 3.51 GB
- **Standalone**: No dependency on original model repo ✅
- **Inference**: Lower memory footprint, faster loading

## Citation

If you use this quantized model, please cite:

```bibtex
@misc{deepseek-ocr-mbq,
  author = {SamMikaelson},
  title = {DeepSeek-OCR MBQ Quantized Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/SamMikaelson/deepseek-ocr-mbq-w4bit}}
}
```

Original model:
```bibtex
@misc{deepseek-ocr,
  title={DeepSeek-OCR},
  author={DeepSeek-AI},
  year={2024},
  howpublished={\url{https://huggingface.co/deepseek-ai/DeepSeek-OCR}}
}
```

## License

MIT License (same as the base model)

## Troubleshooting

If you encounter issues loading the model:

1. Ensure `trust_remote_code=True` is set
2. Install required packages: `pip install -r requirements.txt`
3. Check that you're using transformers >= 4.40.0
4. Use the provided `load_mbq_model.py` helper script

For questions or issues, please open an issue on the model repository.