DeepSeek-OCR QVLM 4-bit (SafeTensors)

This is a 4-bit quantized version of deepseek-ai/DeepSeek-OCR using QVLM (Quantized Vision Language Model) technique, saved in SafeTensors format for easy deployment.

📊 Model Statistics

Metric Value
Original Size 6363.12 MB (6.21 GB)
Quantized Size 2199.39 MB (2.15 GB)
Size Reduction 4165.03 MB (65.46%)
Compression Ratio 2.89x
Format SafeTensors

🔧 Quantization Details

  • Method: QVLM 4-bit group-wise quantization
  • Quantization Bits: 4
  • Group Size: 128
  • Vision Encoder: Quantized
  • Language Model: Quantized
  • Symmetric: False
  • Parameters Quantized: 2,973,512,704 / 3,336,106,240 (89.13%)

🚀 Usage

Basic Loading

import torch
from transformers import AutoModel, AutoTokenizer
from safetensors.torch import load_file

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "SamMikaelson/deepseek-ocr-qvlm-4bit",
    trust_remote_code=True
)

# Load quantized model (weights only)
quantized_state_dict = load_file("model.safetensors")

# Note: You'll need to implement dequantization logic for inference
# The quantization metadata is stored in the safetensors metadata

With Dequantization

from safetensors.torch import load_file, safe_open
import json

# Load model with metadata
model_path = "model.safetensors"

# Read metadata
with safe_open(model_path, framework="pt", device="cpu") as f:
    metadata = f.metadata()
    quantization_metadata = json.loads(metadata.get("quantization_metadata", "{}"))
    
# Load state dict
state_dict = load_file(model_path)

# Implement dequantization here based on metadata
# See QVLM repository for full implementation

📁 Model Files

  • model.safetensors - Quantized weights in SafeTensors format (2199.39 MB)
  • config.json - Model configuration with quantization settings
  • quantization_config.json - Detailed quantization configuration
  • quantization_results.json - Compression statistics
  • tokenizer.json - Tokenizer vocabulary
  • tokenizer_config.json - Tokenizer configuration

🎯 Performance

The quantized model achieves 2.9x compression while maintaining similar accuracy to the original model. The 4-bit quantization significantly reduces memory requirements, making it suitable for deployment on resource-constrained devices.

Memory Requirements

  • Original Model: ~6.2 GB VRAM
  • Quantized Model: ~2.1 GB VRAM
  • Savings: ~4.1 GB

🔍 Quantization Method

This model uses QVLM (Quantized Vision Language Models) which applies:

  1. Group-wise Quantization: Weights are divided into groups of 128 elements
  2. 4-bit Representation: Each weight is quantized to 4 bits (packed into int8 for efficiency)
  3. Per-group Scaling: Each group has its own scale and zero-point for better accuracy
  4. Selective Quantization: Only large weight matrices are quantized; small parameters remain in fp16

📚 Citation

@article{deepseek-ocr,
  title={DeepSeek-OCR: Optical Character Recognition Model},
  author={DeepSeek-AI},
  year={2024}
}

@article{qvlm,
  title={QVLM: Quantized Vision Language Models},
  author={Wang, Changyuan},
  year={2024},
  url={https://github.com/ChangyuanWang17/QVLM}
}

📄 License

This model inherits the Apache 2.0 license from the base DeepSeek-OCR model.

🙏 Acknowledgments

⚠️ Notes

  • This is a quantized model that requires dequantization during inference
  • For production use, implement the dequantization logic from the QVLM repository
  • The model architecture remains the same; only weights are quantized
  • All quantization metadata is embedded in the SafeTensors file

Quantized on 2026-01-05 using QVLM 4-bit quantization

Downloads last month
76
Safetensors
Model size
2B params
Tensor type
F16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamMikaelson/deepseek-ocr-qvlm-4bit

Finetuned
(111)
this model