DeepSeek-OCR-2-FP8

FP8 dynamically quantized version of deepseek-ai/DeepSeek-OCR-2 for faster inference.

Model Details

  • Base Model: deepseek-ai/DeepSeek-OCR-2
  • Architecture: deepseek_vl_v2 (3B parameters)
  • Quantization: FP8 Dynamic (llmcompressor)
  • Model Size: ~3.5GB (vs ~6GB BF16)

Quantization

Quantized using llmcompressor with FP8_DYNAMIC scheme:

from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

recipe = QuantizationModifier(
    targets="Linear",
    scheme="FP8_DYNAMIC",
    ignore=["lm_head"]
)
oneshot(model=model, recipe=recipe)

Usage

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "richarddavison/DeepSeek-OCR-2-FP8",
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "richarddavison/DeepSeek-OCR-2-FP8",
    trust_remote_code=True
)

Requirements

  • transformers==4.46.3
  • torch>=2.0
  • flash-attn (recommended)

License

MIT (same as base model)

Downloads last month
2,466
Safetensors
Model size
3B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for richarddavison/DeepSeek-OCR-2-FP8

Quantized
(1)
this model