SamMikaelson commited on
Commit
46cf7ea
·
verified ·
1 Parent(s): 02ad03f

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: deepseek-ai/DeepSeek-OCR
4
+ tags:
5
+ - quantization
6
+ - int8
7
+ - uniform-quantization
8
+ - model-compression
9
+ ---
10
+
11
+ # Uniform INT8 Quantized DeepSeek-OCR
12
+
13
+ This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
14
+
15
+ ## Quantization Details
16
+
17
+ - **Method**: Uniform INT8 quantization
18
+ - **Quantized Layers**: 2342
19
+ - **Vision Layers**: 96 @ 8-bit
20
+ - **Language Layers**: 2197 @ 8-bit
21
+ - **Average Bit-width**: 8.00
22
+ - **Original Size**: 6363.12 MB
23
+ - **Compressed Size**: 3351.56 MB
24
+ - **Compression Ratio**: 1.90x
25
+
26
+ ## Model Files
27
+
28
+ - `quantized_weights.pt`: Quantized model weights
29
+ - `quantization_info.json`: Layer-wise quantization configuration
30
+ - `layer_configs.json`: Detailed layer configurations
31
+ - `compression_stats.json`: Compression statistics
32
+ - `layer_analysis.json`: Modality analysis (vision/language/other)
33
+
34
+ ## Usage
35
+
36
+ ```python
37
+ import torch
38
+ from transformers import AutoTokenizer
39
+
40
+ # Load tokenizer
41
+ tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
42
+
43
+ # Load quantized weights
44
+ state_dict = torch.load("quantized_weights.pt")
45
+ # Note: You'll need the QuantizedLinear class to properly load and use this model
46
+ ```
47
+
48
+ ## Baseline Characteristics
49
+
50
+ This uniform quantization approach:
51
+ - Applies the **same 8-bit** quantization to ALL layers
52
+ - **Does not distinguish** between vision and language modalities
53
+ - Serves as a **baseline** for comparison with modality-aware methods
54
+
55
+ ## Citation
56
+
57
+ If you use this model, please cite the original model and mention the uniform quantization approach.
compression_stats.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "original_params": 3336106240,
3
+ "quantized_layers": 2342,
4
+ "uniform_bits": 8,
5
+ "avg_bit_width": 8.0,
6
+ "original_size_mb": 6363.11767578125,
7
+ "compressed_size_mb": 3351.557418823242,
8
+ "compression_ratio": 1.898555471568018,
9
+ "vision_layers_quantized": 96,
10
+ "language_layers_quantized": 2197,
11
+ "actual_size_reduction": true,
12
+ "method": "uniform"
13
+ }
layer_analysis.json ADDED
The diff for this file is too large to render. See raw diff
 
layer_configs.json ADDED
The diff for this file is too large to render. See raw diff
 
quantization_info.json ADDED
The diff for this file is too large to render. See raw diff
 
quantized_weights.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17858a6f6131abb66d810483239856f6df98249e477e079e1368aac7b1965ada
3
+ size 3516781114
special_tokens_map.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|User|>",
4
+ "<|Assistant|>"
5
+ ],
6
+ "bos_token": {
7
+ "content": "<|begin▁of▁sentence|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "eos_token": {
14
+ "content": "<|end▁of▁sentence|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "pad_token": {
21
+ "content": "<|▁pad▁|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ }
27
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff