Upload folder using huggingface_hub
Browse files- README.md +57 -0
- compression_stats.json +13 -0
- layer_analysis.json +0 -0
- layer_configs.json +0 -0
- quantization_info.json +0 -0
- quantized_weights.pt +3 -0
- special_tokens_map.json +27 -0
- tokenizer.json +0 -0
- tokenizer_config.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model: deepseek-ai/DeepSeek-OCR
|
| 4 |
+
tags:
|
| 5 |
+
- quantization
|
| 6 |
+
- int8
|
| 7 |
+
- uniform-quantization
|
| 8 |
+
- model-compression
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Uniform INT8 Quantized DeepSeek-OCR
|
| 12 |
+
|
| 13 |
+
This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
|
| 14 |
+
|
| 15 |
+
## Quantization Details
|
| 16 |
+
|
| 17 |
+
- **Method**: Uniform INT8 quantization
|
| 18 |
+
- **Quantized Layers**: 2342
|
| 19 |
+
- **Vision Layers**: 96 @ 8-bit
|
| 20 |
+
- **Language Layers**: 2197 @ 8-bit
|
| 21 |
+
- **Average Bit-width**: 8.00
|
| 22 |
+
- **Original Size**: 6363.12 MB
|
| 23 |
+
- **Compressed Size**: 3351.56 MB
|
| 24 |
+
- **Compression Ratio**: 1.90x
|
| 25 |
+
|
| 26 |
+
## Model Files
|
| 27 |
+
|
| 28 |
+
- `quantized_weights.pt`: Quantized model weights
|
| 29 |
+
- `quantization_info.json`: Layer-wise quantization configuration
|
| 30 |
+
- `layer_configs.json`: Detailed layer configurations
|
| 31 |
+
- `compression_stats.json`: Compression statistics
|
| 32 |
+
- `layer_analysis.json`: Modality analysis (vision/language/other)
|
| 33 |
+
|
| 34 |
+
## Usage
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
import torch
|
| 38 |
+
from transformers import AutoTokenizer
|
| 39 |
+
|
| 40 |
+
# Load tokenizer
|
| 41 |
+
tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
|
| 42 |
+
|
| 43 |
+
# Load quantized weights
|
| 44 |
+
state_dict = torch.load("quantized_weights.pt")
|
| 45 |
+
# Note: You'll need the QuantizedLinear class to properly load and use this model
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Baseline Characteristics
|
| 49 |
+
|
| 50 |
+
This uniform quantization approach:
|
| 51 |
+
- Applies the **same 8-bit** quantization to ALL layers
|
| 52 |
+
- **Does not distinguish** between vision and language modalities
|
| 53 |
+
- Serves as a **baseline** for comparison with modality-aware methods
|
| 54 |
+
|
| 55 |
+
## Citation
|
| 56 |
+
|
| 57 |
+
If you use this model, please cite the original model and mention the uniform quantization approach.
|
compression_stats.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"original_params": 3336106240,
|
| 3 |
+
"quantized_layers": 2342,
|
| 4 |
+
"uniform_bits": 8,
|
| 5 |
+
"avg_bit_width": 8.0,
|
| 6 |
+
"original_size_mb": 6363.11767578125,
|
| 7 |
+
"compressed_size_mb": 3351.557418823242,
|
| 8 |
+
"compression_ratio": 1.898555471568018,
|
| 9 |
+
"vision_layers_quantized": 96,
|
| 10 |
+
"language_layers_quantized": 2197,
|
| 11 |
+
"actual_size_reduction": true,
|
| 12 |
+
"method": "uniform"
|
| 13 |
+
}
|
layer_analysis.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
layer_configs.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
quantization_info.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
quantized_weights.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:17858a6f6131abb66d810483239856f6df98249e477e079e1368aac7b1965ada
|
| 3 |
+
size 3516781114
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"additional_special_tokens": [
|
| 3 |
+
"<|User|>",
|
| 4 |
+
"<|Assistant|>"
|
| 5 |
+
],
|
| 6 |
+
"bos_token": {
|
| 7 |
+
"content": "<|begin▁of▁sentence|>",
|
| 8 |
+
"lstrip": false,
|
| 9 |
+
"normalized": false,
|
| 10 |
+
"rstrip": false,
|
| 11 |
+
"single_word": false
|
| 12 |
+
},
|
| 13 |
+
"eos_token": {
|
| 14 |
+
"content": "<|end▁of▁sentence|>",
|
| 15 |
+
"lstrip": false,
|
| 16 |
+
"normalized": false,
|
| 17 |
+
"rstrip": false,
|
| 18 |
+
"single_word": false
|
| 19 |
+
},
|
| 20 |
+
"pad_token": {
|
| 21 |
+
"content": "<|▁pad▁|>",
|
| 22 |
+
"lstrip": false,
|
| 23 |
+
"normalized": false,
|
| 24 |
+
"rstrip": false,
|
| 25 |
+
"single_word": false
|
| 26 |
+
}
|
| 27 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|