|
|
--- |
|
|
license: mit |
|
|
base_model: deepseek-ai/DeepSeek-OCR |
|
|
tags: |
|
|
- quantization |
|
|
- int8 |
|
|
- uniform-quantization |
|
|
- model-compression |
|
|
--- |
|
|
|
|
|
# Uniform INT8 Quantized DeepSeek-OCR |
|
|
|
|
|
This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR). |
|
|
|
|
|
## Quantization Details |
|
|
|
|
|
- **Method**: Uniform INT8 quantization |
|
|
- **Quantized Layers**: 2342 |
|
|
- **Vision Layers**: 96 @ 8-bit |
|
|
- **Language Layers**: 2197 @ 8-bit |
|
|
- **Average Bit-width**: 8.00 |
|
|
- **Original Size**: 6363.12 MB |
|
|
- **Compressed Size**: 3351.56 MB |
|
|
- **Compression Ratio**: 1.90x |
|
|
|
|
|
## Model Files |
|
|
|
|
|
- `quantized_weights.pt`: Quantized model weights |
|
|
- `quantization_info.json`: Layer-wise quantization configuration |
|
|
- `layer_configs.json`: Detailed layer configurations |
|
|
- `compression_stats.json`: Compression statistics |
|
|
- `layer_analysis.json`: Modality analysis (vision/language/other) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True) |
|
|
|
|
|
# Load quantized weights |
|
|
state_dict = torch.load("quantized_weights.pt") |
|
|
# Note: You'll need the QuantizedLinear class to properly load and use this model |
|
|
``` |
|
|
|
|
|
## Baseline Characteristics |
|
|
|
|
|
This uniform quantization approach: |
|
|
- Applies the **same 8-bit** quantization to ALL layers |
|
|
- **Does not distinguish** between vision and language modalities |
|
|
- Serves as a **baseline** for comparison with modality-aware methods |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original model and mention the uniform quantization approach. |
|
|
|