DeepSeek-OCR INT8 Quantized (Safetensors)

Uniformly quantized version of DeepSeek-OCR using safetensors format + JSON metadata.

📊 Model Stats

  • Original Size: 6363.12 MB
  • Compressed Size: 3351.56 MB
  • Saved Size: 3352.37 MB
  • Compression Ratio: 1.90x
  • Format: Safetensors + JSON metadata

🚀 Quick Start

from model_loader import load_quantized_model

model = load_quantized_model(
    "SamMikaelson/deepseek-ocr-int8-uniform",
    device="cuda"
)

# Model is ready to use!
# Memory footprint: ~3352 MB

🔧 Manual Loading

import torch
from safetensors.torch import load_file
import json

# Load weights
state_dict = load_file("model.safetensors")

# Load metadata
with open("quantization_config.json") as f:
    metadata = json.load(f)

# Reconstruct model (see model_loader.py for details)

📦 File Structure

model.safetensors          # 3352 MB - All weights (compressed)
quantization_config.json   # Layer metadata (bits, shapes)
config.json                # Model config
quantization.py            # QuantizedLinear layer
model_loader.py            # Loading utilities

✨ Why Safetensors?

  • Secure: No arbitrary code execution (vs pickle)
  • Fast: Zero-copy loading
  • Standard: HuggingFace official format
  • Portable: Works across frameworks

📊 Quantization Details

  • Method: Uniform INT8
  • Vision layers: 96 @ 8-bit
  • Language layers: 2197 @ 8-bit
  • Dequantization: On-the-fly during forward pass

🔍 Inspect Metadata

import json

with open("quantization_config.json") as f:
    config = json.load(f)

print(f"Quantized layers: {len(config['quantized_layers'])}")
print(f"Compression: {config['stats']['compression_ratio']}x")

📝 License

MIT (inherited from DeepSeek-OCR)

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support