File size: 6,076 Bytes
04e51cf
 
 
 
 
 
 
 
 
 
 
 
8b71211
04e51cf
 
 
8b71211
04e51cf
8b71211
04e51cf
8b71211
04e51cf
 
 
 
 
 
 
8b71211
 
04e51cf
 
 
 
 
8b71211
 
 
 
04e51cf
8b71211
 
 
 
 
 
 
 
 
04e51cf
 
8b71211
 
04e51cf
8b71211
 
04e51cf
8b71211
04e51cf
 
 
 
 
8b71211
 
 
 
 
 
 
 
 
 
 
 
 
 
04e51cf
 
8b71211
 
04e51cf
 
8b71211
 
04e51cf
8b71211
04e51cf
8b71211
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
04e51cf
 
 
 
8b71211
 
 
 
 
 
 
 
 
 
 
 
 
 
 
04e51cf
8b71211
04e51cf
8b71211
 
 
04e51cf
8b71211
 
 
 
 
 
 
04e51cf
 
 
 
 
8b71211
 
04e51cf
8b71211
04e51cf
 
 
8b71211
 
 
 
04e51cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b71211
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
language:
- en
- zh
license: mit
library_name: transformers
tags:
- ocr
- quantization
- mbq
- deepseek
- vision-language
- standalone
base_model: deepseek-ai/DeepSeek-OCR
---

# DeepSeek-OCR MBQ Quantized Model (Standalone)

This is a **fully standalone** quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using **MBQ (Mixed-precision post-training quantization)**.

✨ **No need to download the original model** - all architecture files included!

## Model Details

- **Base Model**: deepseek-ai/DeepSeek-OCR
- **Quantization Method**: MBQ (Mixed-precision Quantization)
- **Weight Precision**: 4-bit (mixed with 8-bit for sensitive layers)
- **Activation Precision**: 8-bit
- **Format**: SafeTensors (int8 quantized with scales)
- **Standalone**: All architecture files included ✅

## Quantization Statistics

| Metric | Value |
|--------|-------|
| Original Size | 6,672 MB (6.67 GB) |
| **Quantized Size** | **3,510 MB (3.51 GB)** |
| **Size Reduction** | **3,162 MB (47.4%)** |
| **Compression Ratio** | **1.90x** |

## Quick Start (Standalone - No Original Model Needed!)

### Installation

```bash
pip install torch transformers safetensors accelerate pillow
```

### Simple Loading (Recommended)

```python
import torch
from transformers import AutoTokenizer, AutoModel

# Device setup
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model and tokenizer directly - all files included!
tokenizer = AutoTokenizer.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True
)

model = AutoModel.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Load the quantized weights using the helper
from load_mbq_model import load_mbq_model
state_dict = load_mbq_model("./")  # Assumes files are in current directory

model.load_state_dict(state_dict)
model = model.to(device).eval()

print("✅ Model loaded successfully!")
```

### Manual Loading with Dequantization

```python
import torch
from transformers import AutoTokenizer, AutoModel
from safetensors.torch import load_file

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True
)

# Load quantized weights
state_dict = load_file("model.safetensors")

# Separate weights and scales
weights = {}
scales = {}

for name, param in state_dict.items():
    if '.scale' in name:
        scales[name.replace('.scale', '')] = param
    else:
        weights[name] = param

# Dequantize weights
dequantized_state_dict = {}
for name, param in weights.items():
    if name in scales:
        scale = scales[name]
        dequantized = (param.float() * scale).to(torch.bfloat16)
        dequantized_state_dict[name] = dequantized
    else:
        dequantized_state_dict[name] = param

# Load model architecture (included in this repo!)
model = AutoModel.from_pretrained(
    "SamMikaelson/deepseek-ocr-mbq-w4bit",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Load the quantized weights
model.load_state_dict(dequantized_state_dict)
model = model.to(device).eval()

print("✅ Model loaded successfully!")
```

## Model Files

### Core Files
- **model.safetensors** (3.51 GB): Quantized model weights (int8 + scales)
- **load_mbq_model.py**: Helper script for loading

### Architecture Files (from original model)
- **modeling_deepseekocr.py**: Main model architecture
- **modeling_deepseekv2.py**: DeepSeek V2 backbone
- **configuration_deepseek_v2.py**: Model configuration
- **deepencoder.py**: Vision encoder
- **conversation.py**: Conversation utilities
- **processor_config.json**: Processor configuration

### Tokenizer & Config
- **tokenizer.json**: Tokenizer vocabulary
- **tokenizer_config.json**: Tokenizer configuration
- **config.json**: Model configuration
- **special_tokens_map.json**: Special tokens

### Metadata
- **quantization_metadata.json**: Quantization details
- **quantization_report.json**: Compression statistics

## Advantages**Standalone**: All files included, no need to download original model  
✅ **Smaller Size**: 47% reduction in model size  
✅ **Easy Loading**: Simple AutoModel.from_pretrained() with trust_remote_code=True  
**Compatible**: Works with standard transformers library  
**Preserved Quality**: Mixed-precision maintains model performance  

## MBQ Methodology

MBQ (Mixed-precision post-training quantization) intelligently allocates different bit-widths to layers based on their sensitivity:

1. **Sensitivity Analysis**: Computes sensitivity scores using Hessian approximation
2. **Mixed Precision**: High-sensitivity layers (top 15%) → 8-bit, others → 4-bit
3. **Symmetric Quantization**: Efficient quantization scheme for weights and activations
4. **Storage**: Weights stored as int8 with separate scale factors for true compression

## Performance

- **Memory Usage**: Reduced by 47.4%
- **Model Size**: From 6.67 GB to 3.51 GB
- **Standalone**: No dependency on original model repo ✅
- **Inference**: Lower memory footprint, faster loading

## Citation

If you use this quantized model, please cite:

```bibtex
@misc{deepseek-ocr-mbq,
  author = {SamMikaelson},
  title = {DeepSeek-OCR MBQ Quantized Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/SamMikaelson/deepseek-ocr-mbq-w4bit}}
}
```

Original model:
```bibtex
@misc{deepseek-ocr,
  title={DeepSeek-OCR},
  author={DeepSeek-AI},
  year={2024},
  howpublished={\url{https://huggingface.co/deepseek-ai/DeepSeek-OCR}}
}
```

## License

MIT License (same as the base model)

## Troubleshooting

If you encounter issues loading the model:

1. Ensure `trust_remote_code=True` is set
2. Install required packages: `pip install -r requirements.txt`
3. Check that you're using transformers >= 4.40.0
4. Use the provided `load_mbq_model.py` helper script

For questions or issues, please open an issue on the model repository.