DeepSeek-OCR-2-FP8
FP8 dynamically quantized version of deepseek-ai/DeepSeek-OCR-2 for faster inference.
Model Details
- Base Model: deepseek-ai/DeepSeek-OCR-2
- Architecture: deepseek_vl_v2 (3B parameters)
- Quantization: FP8 Dynamic (llmcompressor)
- Model Size: ~3.5GB (vs ~6GB BF16)
Quantization
Quantized using llmcompressor with FP8_DYNAMIC scheme:
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
recipe = QuantizationModifier(
targets="Linear",
scheme="FP8_DYNAMIC",
ignore=["lm_head"]
)
oneshot(model=model, recipe=recipe)
Usage
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"richarddavison/DeepSeek-OCR-2-FP8",
device_map="auto",
torch_dtype="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"richarddavison/DeepSeek-OCR-2-FP8",
trust_remote_code=True
)
Requirements
- transformers==4.46.3
- torch>=2.0
- flash-attn (recommended)
License
MIT (same as base model)
- Downloads last month
- 2,466
Model tree for richarddavison/DeepSeek-OCR-2-FP8
Base model
deepseek-ai/DeepSeek-OCR-2