Upload folder using huggingface_hub
Browse files- README.md +33 -115
- layer_analysis.json +0 -0
- quantized_weights.pt +3 -0
README.md
CHANGED
|
@@ -1,139 +1,57 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
tags:
|
| 5 |
- quantization
|
| 6 |
-
-
|
| 7 |
-
-
|
| 8 |
-
-
|
| 9 |
-
- random-quantization
|
| 10 |
-
base_model: deepseek-ai/DeepSeek-OCR
|
| 11 |
-
pipeline_tag: image-to-text
|
| 12 |
---
|
| 13 |
|
| 14 |
-
#
|
| 15 |
|
| 16 |
-
This is a
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
- **
|
| 24 |
-
- **
|
| 25 |
-
- **Average Bit-Width**: 8.00 bits
|
| 26 |
- **Original Size**: 6363.12 MB
|
| 27 |
- **Compressed Size**: 3351.56 MB
|
| 28 |
-
- **
|
| 29 |
-
|
| 30 |
-
### Architecture
|
| 31 |
-
Based on DeepSeek-OCR with custom `QuantizedLinear` layers that perform on-the-fly dequantization during inference.
|
| 32 |
-
|
| 33 |
-
## Usage
|
| 34 |
-
|
| 35 |
-
### Basic Loading
|
| 36 |
-
|
| 37 |
-
```python
|
| 38 |
-
from transformers import AutoModel, AutoTokenizer
|
| 39 |
-
import torch
|
| 40 |
|
| 41 |
-
|
| 42 |
-
model_name = "SamMikaelson/deepseek-ocr-int8-quantized"
|
| 43 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
| 44 |
-
model = AutoModel.from_pretrained(
|
| 45 |
-
model_name,
|
| 46 |
-
trust_remote_code=True,
|
| 47 |
-
torch_dtype=torch.bfloat16
|
| 48 |
-
).to("cuda")
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
```python
|
| 56 |
-
from transformers import AutoProcessor
|
| 57 |
import torch
|
| 58 |
-
from
|
| 59 |
-
|
| 60 |
-
# Load
|
| 61 |
-
processor = AutoProcessor.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
|
| 62 |
-
model = AutoModel.from_pretrained(
|
| 63 |
-
"SamMikaelson/deepseek-ocr-int8-quantized",
|
| 64 |
-
trust_remote_code=True,
|
| 65 |
-
torch_dtype=torch.bfloat16
|
| 66 |
-
).to("cuda")
|
| 67 |
|
| 68 |
-
#
|
| 69 |
-
|
| 70 |
-
prompt = "<image>\n<|grounding|>Convert the document to markdown."
|
| 71 |
|
| 72 |
-
#
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
result = processor.decode(outputs[0], skip_special_tokens=True)
|
| 76 |
-
|
| 77 |
-
print(result)
|
| 78 |
```
|
| 79 |
|
| 80 |
-
##
|
| 81 |
-
|
| 82 |
-
### Quality Metrics (Expected)
|
| 83 |
-
- **NLS (Normalized Levenshtein Similarity)**: Significantly degraded (~0.01-0.1)
|
| 84 |
-
- **WER (Word Error Rate)**: High error rate (20-50)
|
| 85 |
-
- **Output Generation**: May produce nonsensical outputs due to random quantization
|
| 86 |
-
|
| 87 |
-
### Speed Metrics
|
| 88 |
-
- **Inference Latency**: Comparable to original (dequantization overhead)
|
| 89 |
-
- **Memory Usage**: ~47.3% reduction
|
| 90 |
-
|
| 91 |
-
## Limitations
|
| 92 |
-
|
| 93 |
-
⚠️ **This is a baseline model for research purposes:**
|
| 94 |
-
|
| 95 |
-
1. **Quality Degradation**: Random quantization severely impacts model quality
|
| 96 |
-
2. **Not Production-Ready**: This model is for comparison/research only
|
| 97 |
-
3. **Baseline Purpose**: Demonstrates the lower bound of quantization quality
|
| 98 |
-
|
| 99 |
-
### Why This Model Exists
|
| 100 |
-
|
| 101 |
-
This model serves as a **sanity check** and **lower bound** for intelligent quantization methods:
|
| 102 |
-
- Shows what happens with no quantization intelligence
|
| 103 |
-
- Provides a baseline to compare against optimized methods
|
| 104 |
-
- Validates that your evaluation metrics can detect poor quantization
|
| 105 |
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
-
|
| 110 |
-
- **Mixed-precision methods**: Use different bit-widths per layer based on importance
|
| 111 |
-
- **Quantization-aware training**: Fine-tune after quantization
|
| 112 |
-
- **GPTQ/AWQ**: State-of-the-art quantization methods
|
| 113 |
-
|
| 114 |
-
## Files Included
|
| 115 |
-
|
| 116 |
-
- `model.safetensors` or `pytorch_model.bin`: Complete model with quantized weights
|
| 117 |
-
- `config.json`: Model configuration
|
| 118 |
-
- `tokenizer.json`, `tokenizer_config.json`: Tokenizer files
|
| 119 |
-
- `layer_configs.json`: Per-layer quantization settings
|
| 120 |
-
- `quantization_info.json`: Quantization metadata
|
| 121 |
-
- `compression_stats.json`: Compression statistics
|
| 122 |
|
| 123 |
## Citation
|
| 124 |
|
| 125 |
-
|
| 126 |
-
@misc{deepseek-ocr-random-quantized,
|
| 127 |
-
title={DeepSeek-OCR Random Quantized Model},
|
| 128 |
-
author={SamMikaelson},
|
| 129 |
-
year={2024},
|
| 130 |
-
publisher={Hugging Face},
|
| 131 |
-
howpublished={\url{https://huggingface.co/SamMikaelson/deepseek-ocr-int8-quantized}}
|
| 132 |
-
}
|
| 133 |
-
```
|
| 134 |
-
|
| 135 |
-
Original DeepSeek-OCR model: [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
|
| 136 |
-
|
| 137 |
-
## License
|
| 138 |
-
|
| 139 |
-
Apache 2.0 (same as base model)
|
|
|
|
| 1 |
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model: deepseek-ai/DeepSeek-OCR
|
| 4 |
tags:
|
| 5 |
- quantization
|
| 6 |
+
- int8
|
| 7 |
+
- uniform-quantization
|
| 8 |
+
- model-compression
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# Uniform INT8 Quantized DeepSeek-OCR
|
| 12 |
|
| 13 |
+
This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
|
| 14 |
|
| 15 |
+
## Quantization Details
|
| 16 |
|
| 17 |
+
- **Method**: Uniform INT8 quantization
|
| 18 |
+
- **Quantized Layers**: 2342
|
| 19 |
+
- **Vision Layers**: 96 @ 8-bit
|
| 20 |
+
- **Language Layers**: 2197 @ 8-bit
|
| 21 |
+
- **Average Bit-width**: 8.00
|
|
|
|
| 22 |
- **Original Size**: 6363.12 MB
|
| 23 |
- **Compressed Size**: 3351.56 MB
|
| 24 |
+
- **Compression Ratio**: 1.90x
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
## Model Files
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
- `quantized_weights.pt`: Quantized model weights
|
| 29 |
+
- `quantization_info.json`: Layer-wise quantization configuration
|
| 30 |
+
- `layer_configs.json`: Detailed layer configurations
|
| 31 |
+
- `compression_stats.json`: Compression statistics
|
| 32 |
+
- `layer_analysis.json`: Modality analysis (vision/language/other)
|
| 33 |
|
| 34 |
+
## Usage
|
| 35 |
|
| 36 |
```python
|
|
|
|
| 37 |
import torch
|
| 38 |
+
from transformers import AutoTokenizer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
# Load tokenizer
|
| 41 |
+
tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
|
|
|
|
| 42 |
|
| 43 |
+
# Load quantized weights
|
| 44 |
+
state_dict = torch.load("quantized_weights.pt")
|
| 45 |
+
# Note: You'll need the QuantizedLinear class to properly load and use this model
|
|
|
|
|
|
|
|
|
|
| 46 |
```
|
| 47 |
|
| 48 |
+
## Baseline Characteristics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
This uniform quantization approach:
|
| 51 |
+
- Applies the **same 8-bit** quantization to ALL layers
|
| 52 |
+
- **Does not distinguish** between vision and language modalities
|
| 53 |
+
- Serves as a **baseline** for comparison with modality-aware methods
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
## Citation
|
| 56 |
|
| 57 |
+
If you use this model, please cite the original model and mention the uniform quantization approach.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
layer_analysis.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
quantized_weights.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:17858a6f6131abb66d810483239856f6df98249e477e079e1368aac7b1965ada
|
| 3 |
+
size 3516781114
|