Add README.md
Browse files
README.md
CHANGED
|
@@ -10,14 +10,15 @@ tags:
|
|
| 10 |
- mbq
|
| 11 |
- deepseek
|
| 12 |
- vision-language
|
|
|
|
| 13 |
base_model: deepseek-ai/DeepSeek-OCR
|
| 14 |
---
|
| 15 |
|
| 16 |
-
# DeepSeek-OCR MBQ Quantized Model
|
| 17 |
|
| 18 |
-
This is a quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using **MBQ (Mixed-precision post-training quantization)**.
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
## Model Details
|
| 23 |
|
|
@@ -25,96 +26,156 @@ This is a quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co
|
|
| 25 |
- **Quantization Method**: MBQ (Mixed-precision Quantization)
|
| 26 |
- **Weight Precision**: 4-bit (mixed with 8-bit for sensitive layers)
|
| 27 |
- **Activation Precision**: 8-bit
|
| 28 |
-
- **
|
| 29 |
-
- **
|
| 30 |
|
| 31 |
## Quantization Statistics
|
| 32 |
|
| 33 |
| Metric | Value |
|
| 34 |
|--------|-------|
|
| 35 |
-
| Original Size |
|
| 36 |
-
| Quantized Size |
|
| 37 |
-
| **
|
| 38 |
-
| **
|
| 39 |
-
| **Compression Ratio** | **2.86x** |
|
| 40 |
-
| Quantized Layers | 2342 |
|
| 41 |
|
| 42 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
-
### Standard Loading (Recommended)
|
| 45 |
```python
|
| 46 |
-
|
|
|
|
| 47 |
|
| 48 |
-
#
|
| 49 |
-
|
| 50 |
-
"SamMikaelson/deepseek-ocr-mbq-w4bit",
|
| 51 |
-
trust_remote_code=True,
|
| 52 |
-
torch_dtype="auto"
|
| 53 |
-
)
|
| 54 |
|
|
|
|
| 55 |
tokenizer = AutoTokenizer.from_pretrained(
|
| 56 |
"SamMikaelson/deepseek-ocr-mbq-w4bit",
|
| 57 |
trust_remote_code=True
|
| 58 |
)
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
```
|
| 64 |
|
| 65 |
-
###
|
|
|
|
| 66 |
```python
|
| 67 |
import torch
|
|
|
|
|
|
|
| 68 |
|
| 69 |
-
|
| 70 |
-
quantized_info = torch.load("quantized_weights.pt", map_location="cpu")
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
```
|
| 76 |
|
| 77 |
## Model Files
|
| 78 |
|
| 79 |
-
|
| 80 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
- **config.json**: Model configuration
|
| 82 |
-
- **
|
| 83 |
-
- **quantization_report.json**: Detailed quantization statistics
|
| 84 |
|
| 85 |
-
##
|
|
|
|
|
|
|
| 86 |
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
}
|
| 95 |
-
```
|
| 96 |
|
| 97 |
## MBQ Methodology
|
| 98 |
|
| 99 |
MBQ (Mixed-precision post-training quantization) intelligently allocates different bit-widths to layers based on their sensitivity:
|
| 100 |
|
| 101 |
-
1. **Sensitivity Analysis**: Computes sensitivity scores using
|
| 102 |
-
2. **Mixed Precision**: High-sensitivity layers (top 15
|
| 103 |
3. **Symmetric Quantization**: Efficient quantization scheme for weights and activations
|
| 104 |
-
4. **
|
| 105 |
|
| 106 |
## Performance
|
| 107 |
|
| 108 |
-
- **Memory Usage**: Reduced by
|
| 109 |
-
- **Model Size**: From
|
| 110 |
-
- **
|
| 111 |
-
- **Inference**: Lower memory footprint, faster
|
| 112 |
-
|
| 113 |
-
## Notes
|
| 114 |
-
|
| 115 |
-
The model.safetensors file contains dequantized weights in bfloat16 format for maximum compatibility with the transformers library. While this is larger than the fully quantized version, it still achieves significant size reduction (65.05%) while maintaining ease of use.
|
| 116 |
-
|
| 117 |
-
For the fully compressed quantized weights, see `quantized_weights.pt`.
|
| 118 |
|
| 119 |
## Citation
|
| 120 |
|
|
@@ -142,4 +203,15 @@ Original model:
|
|
| 142 |
|
| 143 |
## License
|
| 144 |
|
| 145 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
- mbq
|
| 11 |
- deepseek
|
| 12 |
- vision-language
|
| 13 |
+
- standalone
|
| 14 |
base_model: deepseek-ai/DeepSeek-OCR
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# DeepSeek-OCR MBQ Quantized Model (Standalone)
|
| 18 |
|
| 19 |
+
This is a **fully standalone** quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using **MBQ (Mixed-precision post-training quantization)**.
|
| 20 |
|
| 21 |
+
✨ **No need to download the original model** - all architecture files included!
|
| 22 |
|
| 23 |
## Model Details
|
| 24 |
|
|
|
|
| 26 |
- **Quantization Method**: MBQ (Mixed-precision Quantization)
|
| 27 |
- **Weight Precision**: 4-bit (mixed with 8-bit for sensitive layers)
|
| 28 |
- **Activation Precision**: 8-bit
|
| 29 |
+
- **Format**: SafeTensors (int8 quantized with scales)
|
| 30 |
+
- **Standalone**: All architecture files included ✅
|
| 31 |
|
| 32 |
## Quantization Statistics
|
| 33 |
|
| 34 |
| Metric | Value |
|
| 35 |
|--------|-------|
|
| 36 |
+
| Original Size | 6,672 MB (6.67 GB) |
|
| 37 |
+
| **Quantized Size** | **3,510 MB (3.51 GB)** |
|
| 38 |
+
| **Size Reduction** | **3,162 MB (47.4%)** |
|
| 39 |
+
| **Compression Ratio** | **1.90x** |
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
## Quick Start (Standalone - No Original Model Needed!)
|
| 42 |
+
|
| 43 |
+
### Installation
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
pip install torch transformers safetensors accelerate pillow
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
### Simple Loading (Recommended)
|
| 50 |
|
|
|
|
| 51 |
```python
|
| 52 |
+
import torch
|
| 53 |
+
from transformers import AutoTokenizer, AutoModel
|
| 54 |
|
| 55 |
+
# Device setup
|
| 56 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
# Load model and tokenizer directly - all files included!
|
| 59 |
tokenizer = AutoTokenizer.from_pretrained(
|
| 60 |
"SamMikaelson/deepseek-ocr-mbq-w4bit",
|
| 61 |
trust_remote_code=True
|
| 62 |
)
|
| 63 |
|
| 64 |
+
model = AutoModel.from_pretrained(
|
| 65 |
+
"SamMikaelson/deepseek-ocr-mbq-w4bit",
|
| 66 |
+
trust_remote_code=True,
|
| 67 |
+
torch_dtype=torch.bfloat16
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
# Load the quantized weights using the helper
|
| 71 |
+
from load_mbq_model import load_mbq_model
|
| 72 |
+
state_dict = load_mbq_model("./") # Assumes files are in current directory
|
| 73 |
+
|
| 74 |
+
model.load_state_dict(state_dict)
|
| 75 |
+
model = model.to(device).eval()
|
| 76 |
+
|
| 77 |
+
print("✅ Model loaded successfully!")
|
| 78 |
```
|
| 79 |
|
| 80 |
+
### Manual Loading with Dequantization
|
| 81 |
+
|
| 82 |
```python
|
| 83 |
import torch
|
| 84 |
+
from transformers import AutoTokenizer, AutoModel
|
| 85 |
+
from safetensors.torch import load_file
|
| 86 |
|
| 87 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
|
| 88 |
|
| 89 |
+
# Load tokenizer
|
| 90 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
| 91 |
+
"SamMikaelson/deepseek-ocr-mbq-w4bit",
|
| 92 |
+
trust_remote_code=True
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
# Load quantized weights
|
| 96 |
+
state_dict = load_file("model.safetensors")
|
| 97 |
+
|
| 98 |
+
# Separate weights and scales
|
| 99 |
+
weights = {}
|
| 100 |
+
scales = {}
|
| 101 |
+
|
| 102 |
+
for name, param in state_dict.items():
|
| 103 |
+
if '.scale' in name:
|
| 104 |
+
scales[name.replace('.scale', '')] = param
|
| 105 |
+
else:
|
| 106 |
+
weights[name] = param
|
| 107 |
+
|
| 108 |
+
# Dequantize weights
|
| 109 |
+
dequantized_state_dict = {}
|
| 110 |
+
for name, param in weights.items():
|
| 111 |
+
if name in scales:
|
| 112 |
+
scale = scales[name]
|
| 113 |
+
dequantized = (param.float() * scale).to(torch.bfloat16)
|
| 114 |
+
dequantized_state_dict[name] = dequantized
|
| 115 |
+
else:
|
| 116 |
+
dequantized_state_dict[name] = param
|
| 117 |
+
|
| 118 |
+
# Load model architecture (included in this repo!)
|
| 119 |
+
model = AutoModel.from_pretrained(
|
| 120 |
+
"SamMikaelson/deepseek-ocr-mbq-w4bit",
|
| 121 |
+
trust_remote_code=True,
|
| 122 |
+
torch_dtype=torch.bfloat16
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
# Load the quantized weights
|
| 126 |
+
model.load_state_dict(dequantized_state_dict)
|
| 127 |
+
model = model.to(device).eval()
|
| 128 |
+
|
| 129 |
+
print("✅ Model loaded successfully!")
|
| 130 |
```
|
| 131 |
|
| 132 |
## Model Files
|
| 133 |
|
| 134 |
+
### Core Files
|
| 135 |
+
- **model.safetensors** (3.51 GB): Quantized model weights (int8 + scales)
|
| 136 |
+
- **load_mbq_model.py**: Helper script for loading
|
| 137 |
+
|
| 138 |
+
### Architecture Files (from original model)
|
| 139 |
+
- **modeling_deepseekocr.py**: Main model architecture
|
| 140 |
+
- **modeling_deepseekv2.py**: DeepSeek V2 backbone
|
| 141 |
+
- **configuration_deepseek_v2.py**: Model configuration
|
| 142 |
+
- **deepencoder.py**: Vision encoder
|
| 143 |
+
- **conversation.py**: Conversation utilities
|
| 144 |
+
- **processor_config.json**: Processor configuration
|
| 145 |
+
|
| 146 |
+
### Tokenizer & Config
|
| 147 |
+
- **tokenizer.json**: Tokenizer vocabulary
|
| 148 |
+
- **tokenizer_config.json**: Tokenizer configuration
|
| 149 |
- **config.json**: Model configuration
|
| 150 |
+
- **special_tokens_map.json**: Special tokens
|
|
|
|
| 151 |
|
| 152 |
+
### Metadata
|
| 153 |
+
- **quantization_metadata.json**: Quantization details
|
| 154 |
+
- **quantization_report.json**: Compression statistics
|
| 155 |
|
| 156 |
+
## Advantages
|
| 157 |
+
|
| 158 |
+
✅ **Standalone**: All files included, no need to download original model
|
| 159 |
+
✅ **Smaller Size**: 47% reduction in model size
|
| 160 |
+
✅ **Easy Loading**: Simple AutoModel.from_pretrained() with trust_remote_code=True
|
| 161 |
+
✅ **Compatible**: Works with standard transformers library
|
| 162 |
+
✅ **Preserved Quality**: Mixed-precision maintains model performance
|
|
|
|
|
|
|
| 163 |
|
| 164 |
## MBQ Methodology
|
| 165 |
|
| 166 |
MBQ (Mixed-precision post-training quantization) intelligently allocates different bit-widths to layers based on their sensitivity:
|
| 167 |
|
| 168 |
+
1. **Sensitivity Analysis**: Computes sensitivity scores using Hessian approximation
|
| 169 |
+
2. **Mixed Precision**: High-sensitivity layers (top 15%) → 8-bit, others → 4-bit
|
| 170 |
3. **Symmetric Quantization**: Efficient quantization scheme for weights and activations
|
| 171 |
+
4. **Storage**: Weights stored as int8 with separate scale factors for true compression
|
| 172 |
|
| 173 |
## Performance
|
| 174 |
|
| 175 |
+
- **Memory Usage**: Reduced by 47.4%
|
| 176 |
+
- **Model Size**: From 6.67 GB to 3.51 GB
|
| 177 |
+
- **Standalone**: No dependency on original model repo ✅
|
| 178 |
+
- **Inference**: Lower memory footprint, faster loading
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
|
| 180 |
## Citation
|
| 181 |
|
|
|
|
| 203 |
|
| 204 |
## License
|
| 205 |
|
| 206 |
+
MIT License (same as the base model)
|
| 207 |
+
|
| 208 |
+
## Troubleshooting
|
| 209 |
+
|
| 210 |
+
If you encounter issues loading the model:
|
| 211 |
+
|
| 212 |
+
1. Ensure `trust_remote_code=True` is set
|
| 213 |
+
2. Install required packages: `pip install -r requirements.txt`
|
| 214 |
+
3. Check that you're using transformers >= 4.40.0
|
| 215 |
+
4. Use the provided `load_mbq_model.py` helper script
|
| 216 |
+
|
| 217 |
+
For questions or issues, please open an issue on the model repository.
|