DeepSeek-OCR QVLM 4-bit (SafeTensors)

This is a 4-bit quantized version of deepseek-ai/DeepSeek-OCR using QVLM (Quantized Vision Language Model) technique, saved in SafeTensors format for easy deployment.

📊 Model Statistics

Metric	Value
Original Size	6363.12 MB (6.21 GB)
Quantized Size	2199.39 MB (2.15 GB)
Size Reduction	4165.03 MB (65.46%)
Compression Ratio	2.89x
Format	SafeTensors

🔧 Quantization Details

Method: QVLM 4-bit group-wise quantization
Quantization Bits: 4
Group Size: 128
Vision Encoder: Quantized
Language Model: Quantized
Symmetric: False
Parameters Quantized: 2,973,512,704 / 3,336,106,240 (89.13%)

🚀 Usage

Basic Loading

import torch
from transformers import AutoModel, AutoTokenizer
from safetensors.torch import load_file

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "SamMikaelson/deepseek-ocr-qvlm-4bit",
    trust_remote_code=True
)

# Load quantized model (weights only)
quantized_state_dict = load_file("model.safetensors")

# Note: You'll need to implement dequantization logic for inference
# The quantization metadata is stored in the safetensors metadata

With Dequantization

from safetensors.torch import load_file, safe_open
import json

# Load model with metadata
model_path = "model.safetensors"

# Read metadata
with safe_open(model_path, framework="pt", device="cpu") as f:
    metadata = f.metadata()
    quantization_metadata = json.loads(metadata.get("quantization_metadata", "{}"))
    
# Load state dict
state_dict = load_file(model_path)

# Implement dequantization here based on metadata
# See QVLM repository for full implementation

📁 Model Files

model.safetensors - Quantized weights in SafeTensors format (2199.39 MB)
config.json - Model configuration with quantization settings
quantization_config.json - Detailed quantization configuration
quantization_results.json - Compression statistics
tokenizer.json - Tokenizer vocabulary
tokenizer_config.json - Tokenizer configuration

🎯 Performance

The quantized model achieves 2.9x compression while maintaining similar accuracy to the original model. The 4-bit quantization significantly reduces memory requirements, making it suitable for deployment on resource-constrained devices.

Memory Requirements

Original Model: ~6.2 GB VRAM
Quantized Model: ~2.1 GB VRAM
Savings: ~4.1 GB

🔍 Quantization Method

This model uses QVLM (Quantized Vision Language Models) which applies:

Group-wise Quantization: Weights are divided into groups of 128 elements
4-bit Representation: Each weight is quantized to 4 bits (packed into int8 for efficiency)
Per-group Scaling: Each group has its own scale and zero-point for better accuracy
Selective Quantization: Only large weight matrices are quantized; small parameters remain in fp16

📚 Citation

@article{deepseek-ocr,
  title={DeepSeek-OCR: Optical Character Recognition Model},
  author={DeepSeek-AI},
  year={2024}
}

@article{qvlm,
  title={QVLM: Quantized Vision Language Models},
  author={Wang, Changyuan},
  year={2024},
  url={https://github.com/ChangyuanWang17/QVLM}
}

📄 License

This model inherits the Apache 2.0 license from the base DeepSeek-OCR model.

🙏 Acknowledgments

Base Model: DeepSeek-AI
Quantization Method: QVLM
Format: SafeTensors

⚠️ Notes

This is a quantized model that requires dequantization during inference
For production use, implement the dequantization logic from the QVLM repository
The model architecture remains the same; only weights are quantized
All quantization metadata is embedded in the SafeTensors file

Quantized on 2026-01-05 using QVLM 4-bit quantization

Downloads last month: 29

Safetensors

Model size

2B params

Tensor type

F16

Model tree for SamMikaelson/deepseek-ocr-qvlm-4bit

Base model

deepseek-ai/DeepSeek-OCR

Finetuned

(125)

this model