SamMikaelson commited on
Commit
f34ca0c
·
verified ·
1 Parent(s): 0c6623c

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +33 -115
  2. layer_analysis.json +0 -0
  3. quantized_weights.pt +3 -0
README.md CHANGED
@@ -1,139 +1,57 @@
1
  ---
2
- language: en
3
- license: apache-2.0
4
  tags:
5
  - quantization
6
- - deepseek
7
- - ocr
8
- - document-understanding
9
- - random-quantization
10
- base_model: deepseek-ai/DeepSeek-OCR
11
- pipeline_tag: image-to-text
12
  ---
13
 
14
- # DeepSeek-OCR Random Quantized Model (Standalone)
15
 
16
- This is a **fully standalone randomly quantized** version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
17
 
18
- ⚠️ **Note**: This model uses random quantization as a baseline for comparison. It is NOT optimized and will have significant quality degradation. This serves as a lower bound for intelligent quantization methods.
19
 
20
- ## Model Details
21
-
22
- ### Quantization Statistics
23
- - **Method**: Random Quantization (Baseline)
24
- - **Compression Ratio**: 1.90x
25
- - **Average Bit-Width**: 8.00 bits
26
  - **Original Size**: 6363.12 MB
27
  - **Compressed Size**: 3351.56 MB
28
- - **Size Reduction**: ~47.3%
29
-
30
- ### Architecture
31
- Based on DeepSeek-OCR with custom `QuantizedLinear` layers that perform on-the-fly dequantization during inference.
32
-
33
- ## Usage
34
-
35
- ### Basic Loading
36
-
37
- ```python
38
- from transformers import AutoModel, AutoTokenizer
39
- import torch
40
 
41
- # Load model and tokenizer (no base model needed!)
42
- model_name = "SamMikaelson/deepseek-ocr-int8-quantized"
43
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
44
- model = AutoModel.from_pretrained(
45
- model_name,
46
- trust_remote_code=True,
47
- torch_dtype=torch.bfloat16
48
- ).to("cuda")
49
 
50
- # The model is ready to use!
51
- ```
 
 
 
52
 
53
- ### For Document OCR
54
 
55
  ```python
56
- from transformers import AutoProcessor
57
  import torch
58
- from PIL import Image
59
-
60
- # Load
61
- processor = AutoProcessor.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
62
- model = AutoModel.from_pretrained(
63
- "SamMikaelson/deepseek-ocr-int8-quantized",
64
- trust_remote_code=True,
65
- torch_dtype=torch.bfloat16
66
- ).to("cuda")
67
 
68
- # Inference
69
- image = Image.open("document.jpg")
70
- prompt = "<image>\n<|grounding|>Convert the document to markdown."
71
 
72
- # Process and generate
73
- inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda")
74
- outputs = model.generate(**inputs, max_length=2048)
75
- result = processor.decode(outputs[0], skip_special_tokens=True)
76
-
77
- print(result)
78
  ```
79
 
80
- ## Performance Characteristics
81
-
82
- ### Quality Metrics (Expected)
83
- - **NLS (Normalized Levenshtein Similarity)**: Significantly degraded (~0.01-0.1)
84
- - **WER (Word Error Rate)**: High error rate (20-50)
85
- - **Output Generation**: May produce nonsensical outputs due to random quantization
86
-
87
- ### Speed Metrics
88
- - **Inference Latency**: Comparable to original (dequantization overhead)
89
- - **Memory Usage**: ~47.3% reduction
90
-
91
- ## Limitations
92
-
93
- ⚠️ **This is a baseline model for research purposes:**
94
-
95
- 1. **Quality Degradation**: Random quantization severely impacts model quality
96
- 2. **Not Production-Ready**: This model is for comparison/research only
97
- 3. **Baseline Purpose**: Demonstrates the lower bound of quantization quality
98
-
99
- ### Why This Model Exists
100
-
101
- This model serves as a **sanity check** and **lower bound** for intelligent quantization methods:
102
- - Shows what happens with no quantization intelligence
103
- - Provides a baseline to compare against optimized methods
104
- - Validates that your evaluation metrics can detect poor quantization
105
 
106
- ## Better Alternatives
107
-
108
- For production use, consider:
109
- - **Sensitivity-aware quantization**: Quantize less important layers more aggressively
110
- - **Mixed-precision methods**: Use different bit-widths per layer based on importance
111
- - **Quantization-aware training**: Fine-tune after quantization
112
- - **GPTQ/AWQ**: State-of-the-art quantization methods
113
-
114
- ## Files Included
115
-
116
- - `model.safetensors` or `pytorch_model.bin`: Complete model with quantized weights
117
- - `config.json`: Model configuration
118
- - `tokenizer.json`, `tokenizer_config.json`: Tokenizer files
119
- - `layer_configs.json`: Per-layer quantization settings
120
- - `quantization_info.json`: Quantization metadata
121
- - `compression_stats.json`: Compression statistics
122
 
123
  ## Citation
124
 
125
- ```bibtex
126
- @misc{deepseek-ocr-random-quantized,
127
- title={DeepSeek-OCR Random Quantized Model},
128
- author={SamMikaelson},
129
- year={2024},
130
- publisher={Hugging Face},
131
- howpublished={\url{https://huggingface.co/SamMikaelson/deepseek-ocr-int8-quantized}}
132
- }
133
- ```
134
-
135
- Original DeepSeek-OCR model: [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
136
-
137
- ## License
138
-
139
- Apache 2.0 (same as base model)
 
1
  ---
2
+ license: mit
3
+ base_model: deepseek-ai/DeepSeek-OCR
4
  tags:
5
  - quantization
6
+ - int8
7
+ - uniform-quantization
8
+ - model-compression
 
 
 
9
  ---
10
 
11
+ # Uniform INT8 Quantized DeepSeek-OCR
12
 
13
+ This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
14
 
15
+ ## Quantization Details
16
 
17
+ - **Method**: Uniform INT8 quantization
18
+ - **Quantized Layers**: 2342
19
+ - **Vision Layers**: 96 @ 8-bit
20
+ - **Language Layers**: 2197 @ 8-bit
21
+ - **Average Bit-width**: 8.00
 
22
  - **Original Size**: 6363.12 MB
23
  - **Compressed Size**: 3351.56 MB
24
+ - **Compression Ratio**: 1.90x
 
 
 
 
 
 
 
 
 
 
 
25
 
26
+ ## Model Files
 
 
 
 
 
 
 
27
 
28
+ - `quantized_weights.pt`: Quantized model weights
29
+ - `quantization_info.json`: Layer-wise quantization configuration
30
+ - `layer_configs.json`: Detailed layer configurations
31
+ - `compression_stats.json`: Compression statistics
32
+ - `layer_analysis.json`: Modality analysis (vision/language/other)
33
 
34
+ ## Usage
35
 
36
  ```python
 
37
  import torch
38
+ from transformers import AutoTokenizer
 
 
 
 
 
 
 
 
39
 
40
+ # Load tokenizer
41
+ tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
 
42
 
43
+ # Load quantized weights
44
+ state_dict = torch.load("quantized_weights.pt")
45
+ # Note: You'll need the QuantizedLinear class to properly load and use this model
 
 
 
46
  ```
47
 
48
+ ## Baseline Characteristics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
+ This uniform quantization approach:
51
+ - Applies the **same 8-bit** quantization to ALL layers
52
+ - **Does not distinguish** between vision and language modalities
53
+ - Serves as a **baseline** for comparison with modality-aware methods
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ## Citation
56
 
57
+ If you use this model, please cite the original model and mention the uniform quantization approach.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
layer_analysis.json ADDED
The diff for this file is too large to render. See raw diff
 
quantized_weights.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17858a6f6131abb66d810483239856f6df98249e477e079e1368aac7b1965ada
3
+ size 3516781114