GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper
•
2210.17323
•
Published
•
10
This is a 4-bit GPTQ quantized and bit-packed version of deepseek-ai/DeepSeek-OCR.
This model uses actual bit-packing where two 4-bit values are stored per byte, achieving true 4x compression.
| Metric | Original | This Model | Savings |
|---|---|---|---|
| Size | 6.67 GB | 1.59 GB | 5.08 GB |
| Precision | bfloat16 | 4-bit INT4 | 4x compression |
| Compression | 1x | 4x | 75% reduction |
model.safetensors (1.59 GB) - This is your compressed 4-bit modelload_4bit.py - Python script to unpack and load the modelquantization_config.json - Quantization parametersconfig.json - Model configurationfrom transformers import AutoTokenizer
from load_4bit import load_quantized_model
import torch
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"SamMikaelson/deepseek-ocr-gptq-4bit",
trust_remote_code=True
)
# Load and unpack the 4-bit model
state_dict = load_quantized_model("./model_folder")
# Load into your model architecture
from transformers import AutoModel
model = AutoModel.from_pretrained(
"deepseek-ai/DeepSeek-OCR",
trust_remote_code=True
)
model.load_state_dict(state_dict, strict=False)
from safetensors.torch import load_file
import torch
# Load packed weights
tensors = load_file("model.safetensors")
# Unpack 4-bit weights (see load_4bit.py for full implementation)
def unpack_4bit(packed):
rows, packed_cols = packed.shape
unpacked = torch.zeros((rows, packed_cols * 2), dtype=torch.uint8)
unpacked[:, 0::2] = (packed >> 4) & 0x0F
unpacked[:, 1::2] = packed & 0x0F
return unpacked
# Use unpacked weights with scales
for key in tensors:
if key.endswith('.weight_packed'):
packed = tensors[key]
scale = tensors[key.replace('.weight_packed', '.scale')]
weights = unpack_4bit(packed).float() * scale
Perfect for:
@article{frantar2023gptq,
title={GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers},
author={Frantar, Elias and Ashkboos, Saleh and Hoefler, Torsten and Alistarh, Dan},
journal={arXiv preprint arXiv:2210.17323},
year={2023}
}
Inherits license from base model: deepseek-ai/DeepSeek-OCR
Model File: model.safetensors (1.59 GB) is your compressed 4-bit model!
Need help? Check load_4bit.py for usage examples.
Base model
deepseek-ai/DeepSeek-OCR