File size: 1,622 Bytes
0c6623c
f34ca0c
 
0c6623c
 
f34ca0c
 
 
0c6623c
 
f34ca0c
0c6623c
f34ca0c
0c6623c
f34ca0c
0c6623c
f34ca0c
 
 
 
 
0c6623c
 
f34ca0c
0c6623c
f34ca0c
0c6623c
f34ca0c
 
 
 
 
0c6623c
f34ca0c
0c6623c
 
 
f34ca0c
0c6623c
f34ca0c
 
0c6623c
f34ca0c
 
 
0c6623c
 
f34ca0c
0c6623c
f34ca0c
 
 
 
0c6623c
 
 
f34ca0c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
base_model: deepseek-ai/DeepSeek-OCR
tags:
- quantization
- int8
- uniform-quantization
- model-compression
---

# Uniform INT8 Quantized DeepSeek-OCR

This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).

## Quantization Details

- **Method**: Uniform INT8 quantization
- **Quantized Layers**: 2342
- **Vision Layers**: 96 @ 8-bit
- **Language Layers**: 2197 @ 8-bit
- **Average Bit-width**: 8.00
- **Original Size**: 6363.12 MB
- **Compressed Size**: 3351.56 MB
- **Compression Ratio**: 1.90x

## Model Files

- `quantized_weights.pt`: Quantized model weights
- `quantization_info.json`: Layer-wise quantization configuration
- `layer_configs.json`: Detailed layer configurations
- `compression_stats.json`: Compression statistics
- `layer_analysis.json`: Modality analysis (vision/language/other)

## Usage

```python
import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)

# Load quantized weights
state_dict = torch.load("quantized_weights.pt")
# Note: You'll need the QuantizedLinear class to properly load and use this model
```

## Baseline Characteristics

This uniform quantization approach:
- Applies the **same 8-bit** quantization to ALL layers
- **Does not distinguish** between vision and language modalities
- Serves as a **baseline** for comparison with modality-aware methods

## Citation

If you use this model, please cite the original model and mention the uniform quantization approach.