File size: 4,284 Bytes

cad72e6

---
license: apache-2.0
base_model: deepseek-ai/DeepSeek-OCR
tags:
- vision
- ocr
- quantized
- qvlm
- 4-bit
- deepseek
- safetensors
library_name: transformers
pipeline_tag: image-to-text
---

# DeepSeek-OCR QVLM 4-bit (SafeTensors)

This is a 4-bit quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) using QVLM (Quantized Vision Language Model) technique, saved in SafeTensors format for easy deployment.

## 📊 Model Statistics

| Metric | Value |
|--------|-------|
| **Original Size** | 6363.12 MB (6.21 GB) |
| **Quantized Size** | 2199.39 MB (2.15 GB) |
| **Size Reduction** | 4165.03 MB (65.46%) |
| **Compression Ratio** | 2.89x |
| **Format** | SafeTensors |

## 🔧 Quantization Details

- **Method:** QVLM 4-bit group-wise quantization
- **Quantization Bits:** 4
- **Group Size:** 128
- **Vision Encoder:** Quantized
- **Language Model:** Quantized
- **Symmetric:** False
- **Parameters Quantized:** 2,973,512,704 / 3,336,106,240 (89.13%)

## 🚀 Usage

### Basic Loading

```python
import torch
from transformers import AutoModel, AutoTokenizer
from safetensors.torch import load_file

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "SamMikaelson/deepseek-ocr-qvlm-4bit",
    trust_remote_code=True
)

# Load quantized model (weights only)
quantized_state_dict = load_file("model.safetensors")

# Note: You'll need to implement dequantization logic for inference
# The quantization metadata is stored in the safetensors metadata
```

### With Dequantization

```python
from safetensors.torch import load_file, safe_open
import json

# Load model with metadata
model_path = "model.safetensors"

# Read metadata
with safe_open(model_path, framework="pt", device="cpu") as f:
    metadata = f.metadata()
    quantization_metadata = json.loads(metadata.get("quantization_metadata", "{}"))
    
# Load state dict
state_dict = load_file(model_path)

# Implement dequantization here based on metadata
# See QVLM repository for full implementation
```

## 📁 Model Files

- `model.safetensors` - Quantized weights in SafeTensors format (2199.39 MB)
- `config.json` - Model configuration with quantization settings
- `quantization_config.json` - Detailed quantization configuration
- `quantization_results.json` - Compression statistics
- `tokenizer.json` - Tokenizer vocabulary
- `tokenizer_config.json` - Tokenizer configuration

## 🎯 Performance

The quantized model achieves **2.9x compression** while maintaining similar accuracy to the original model. The 4-bit quantization significantly reduces memory requirements, making it suitable for deployment on resource-constrained devices.

### Memory Requirements

- **Original Model:** ~6.2 GB VRAM
- **Quantized Model:** ~2.1 GB VRAM
- **Savings:** ~4.1 GB

## 🔍 Quantization Method

This model uses QVLM (Quantized Vision Language Models) which applies:

1. **Group-wise Quantization:** Weights are divided into groups of 128 elements
2. **4-bit Representation:** Each weight is quantized to 4 bits (packed into int8 for efficiency)
3. **Per-group Scaling:** Each group has its own scale and zero-point for better accuracy
4. **Selective Quantization:** Only large weight matrices are quantized; small parameters remain in fp16

## 📚 Citation

```bibtex
@article{deepseek-ocr,
  title={DeepSeek-OCR: Optical Character Recognition Model},
  author={DeepSeek-AI},
  year={2024}
}

@article{qvlm,
  title={QVLM: Quantized Vision Language Models},
  author={Wang, Changyuan},
  year={2024},
  url={https://github.com/ChangyuanWang17/QVLM}
}
```

## 📄 License

This model inherits the Apache 2.0 license from the base DeepSeek-OCR model.

## 🙏 Acknowledgments

- **Base Model:** [DeepSeek-AI](https://huggingface.co/deepseek-ai)
- **Quantization Method:** [QVLM](https://github.com/ChangyuanWang17/QVLM)
- **Format:** [SafeTensors](https://github.com/huggingface/safetensors)

## ⚠️ Notes

- This is a quantized model that requires dequantization during inference
- For production use, implement the dequantization logic from the QVLM repository
- The model architecture remains the same; only weights are quantized
- All quantization metadata is embedded in the SafeTensors file

---

*Quantized on 2026-01-05 using QVLM 4-bit quantization*