SamMikaelson commited on
Commit
0c6623c
·
verified ·
1 Parent(s): 4c79198

Upload standalone random quantized model

Browse files
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - quantization
6
+ - deepseek
7
+ - ocr
8
+ - document-understanding
9
+ - random-quantization
10
+ base_model: deepseek-ai/DeepSeek-OCR
11
+ pipeline_tag: image-to-text
12
+ ---
13
+
14
+ # DeepSeek-OCR Random Quantized Model (Standalone)
15
+
16
+ This is a **fully standalone randomly quantized** version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
17
+
18
+ ⚠️ **Note**: This model uses random quantization as a baseline for comparison. It is NOT optimized and will have significant quality degradation. This serves as a lower bound for intelligent quantization methods.
19
+
20
+ ## Model Details
21
+
22
+ ### Quantization Statistics
23
+ - **Method**: Random Quantization (Baseline)
24
+ - **Compression Ratio**: 1.90x
25
+ - **Average Bit-Width**: 8.00 bits
26
+ - **Original Size**: 6363.12 MB
27
+ - **Compressed Size**: 3351.56 MB
28
+ - **Size Reduction**: ~47.3%
29
+
30
+ ### Architecture
31
+ Based on DeepSeek-OCR with custom `QuantizedLinear` layers that perform on-the-fly dequantization during inference.
32
+
33
+ ## Usage
34
+
35
+ ### Basic Loading
36
+
37
+ ```python
38
+ from transformers import AutoModel, AutoTokenizer
39
+ import torch
40
+
41
+ # Load model and tokenizer (no base model needed!)
42
+ model_name = "SamMikaelson/deepseek-ocr-int8-quantized"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
44
+ model = AutoModel.from_pretrained(
45
+ model_name,
46
+ trust_remote_code=True,
47
+ torch_dtype=torch.bfloat16
48
+ ).to("cuda")
49
+
50
+ # The model is ready to use!
51
+ ```
52
+
53
+ ### For Document OCR
54
+
55
+ ```python
56
+ from transformers import AutoProcessor
57
+ import torch
58
+ from PIL import Image
59
+
60
+ # Load
61
+ processor = AutoProcessor.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
62
+ model = AutoModel.from_pretrained(
63
+ "SamMikaelson/deepseek-ocr-int8-quantized",
64
+ trust_remote_code=True,
65
+ torch_dtype=torch.bfloat16
66
+ ).to("cuda")
67
+
68
+ # Inference
69
+ image = Image.open("document.jpg")
70
+ prompt = "<image>\n<|grounding|>Convert the document to markdown."
71
+
72
+ # Process and generate
73
+ inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda")
74
+ outputs = model.generate(**inputs, max_length=2048)
75
+ result = processor.decode(outputs[0], skip_special_tokens=True)
76
+
77
+ print(result)
78
+ ```
79
+
80
+ ## Performance Characteristics
81
+
82
+ ### Quality Metrics (Expected)
83
+ - **NLS (Normalized Levenshtein Similarity)**: Significantly degraded (~0.01-0.1)
84
+ - **WER (Word Error Rate)**: High error rate (20-50)
85
+ - **Output Generation**: May produce nonsensical outputs due to random quantization
86
+
87
+ ### Speed Metrics
88
+ - **Inference Latency**: Comparable to original (dequantization overhead)
89
+ - **Memory Usage**: ~47.3% reduction
90
+
91
+ ## Limitations
92
+
93
+ ⚠️ **This is a baseline model for research purposes:**
94
+
95
+ 1. **Quality Degradation**: Random quantization severely impacts model quality
96
+ 2. **Not Production-Ready**: This model is for comparison/research only
97
+ 3. **Baseline Purpose**: Demonstrates the lower bound of quantization quality
98
+
99
+ ### Why This Model Exists
100
+
101
+ This model serves as a **sanity check** and **lower bound** for intelligent quantization methods:
102
+ - Shows what happens with no quantization intelligence
103
+ - Provides a baseline to compare against optimized methods
104
+ - Validates that your evaluation metrics can detect poor quantization
105
+
106
+ ## Better Alternatives
107
+
108
+ For production use, consider:
109
+ - **Sensitivity-aware quantization**: Quantize less important layers more aggressively
110
+ - **Mixed-precision methods**: Use different bit-widths per layer based on importance
111
+ - **Quantization-aware training**: Fine-tune after quantization
112
+ - **GPTQ/AWQ**: State-of-the-art quantization methods
113
+
114
+ ## Files Included
115
+
116
+ - `model.safetensors` or `pytorch_model.bin`: Complete model with quantized weights
117
+ - `config.json`: Model configuration
118
+ - `tokenizer.json`, `tokenizer_config.json`: Tokenizer files
119
+ - `layer_configs.json`: Per-layer quantization settings
120
+ - `quantization_info.json`: Quantization metadata
121
+ - `compression_stats.json`: Compression statistics
122
+
123
+ ## Citation
124
+
125
+ ```bibtex
126
+ @misc{deepseek-ocr-random-quantized,
127
+ title={DeepSeek-OCR Random Quantized Model},
128
+ author={SamMikaelson},
129
+ year={2024},
130
+ publisher={Hugging Face},
131
+ howpublished={\url{https://huggingface.co/SamMikaelson/deepseek-ocr-int8-quantized}}
132
+ }
133
+ ```
134
+
135
+ Original DeepSeek-OCR model: [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
136
+
137
+ ## License
138
+
139
+ Apache 2.0 (same as base model)
compression_stats.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "original_params": 3336106240,
3
+ "quantized_layers": 2342,
4
+ "uniform_bits": 8,
5
+ "avg_bit_width": 8.0,
6
+ "original_size_mb": 6363.11767578125,
7
+ "compressed_size_mb": 3351.557418823242,
8
+ "compression_ratio": 1.898555471568018,
9
+ "vision_layers_quantized": 96,
10
+ "language_layers_quantized": 2197,
11
+ "actual_size_reduction": true,
12
+ "method": "uniform"
13
+ }
config.json ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "deepseek-ai/DeepSeek-OCR",
3
+ "architectures": [
4
+ "DeepseekOCRForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "auto_map": {
9
+ "AutoConfig": "deepseek-ai/DeepSeek-OCR--modeling_deepseekocr.DeepseekOCRConfig",
10
+ "AutoModel": "deepseek-ai/DeepSeek-OCR--modeling_deepseekocr.DeepseekOCRForCausalLM"
11
+ },
12
+ "aux_loss_alpha": 0.001,
13
+ "bos_token_id": 0,
14
+ "candidate_resolutions": [
15
+ [
16
+ 1024,
17
+ 1024
18
+ ]
19
+ ],
20
+ "eos_token_id": 1,
21
+ "ep_size": 1,
22
+ "first_k_dense_replace": 1,
23
+ "global_view_pos": "head",
24
+ "hidden_act": "silu",
25
+ "hidden_size": 1280,
26
+ "initializer_range": 0.02,
27
+ "intermediate_size": 6848,
28
+ "kv_lora_rank": null,
29
+ "language_config": {
30
+ "architectures": [
31
+ "DeepseekV2ForCausalLM"
32
+ ],
33
+ "auto_map": {
34
+ "AutoConfig": "configuration_deepseekv2.DeepseekV2Config",
35
+ "AutoModel": "modeling_deepseek.DeepseekV2Model",
36
+ "AutoModelForCausalLM": "modeling_deepseek.DeepseekV2ForCausalLM"
37
+ },
38
+ "bos_token_id": 0,
39
+ "eos_token_id": 1,
40
+ "first_k_dense_replace": 1,
41
+ "hidden_size": 1280,
42
+ "intermediate_size": 6848,
43
+ "kv_lora_rank": null,
44
+ "lm_head": true,
45
+ "max_position_embeddings": 8192,
46
+ "moe_intermediate_size": 896,
47
+ "n_group": 1,
48
+ "n_routed_experts": 64,
49
+ "n_shared_experts": 2,
50
+ "num_attention_heads": 10,
51
+ "num_experts_per_tok": 6,
52
+ "num_hidden_layers": 12,
53
+ "num_key_value_heads": 10,
54
+ "q_lora_rank": null,
55
+ "qk_nope_head_dim": 0,
56
+ "qk_rope_head_dim": 0,
57
+ "rm_head": false,
58
+ "topk_group": 1,
59
+ "topk_method": "greedy",
60
+ "torch_dtype": "bfloat16",
61
+ "use_mla": false,
62
+ "v_head_dim": 0,
63
+ "vocab_size": 129280
64
+ },
65
+ "lm_head": true,
66
+ "max_position_embeddings": 8192,
67
+ "model_type": "DeepseekOCR",
68
+ "moe_intermediate_size": 896,
69
+ "moe_layer_freq": 1,
70
+ "n_group": 1,
71
+ "n_routed_experts": 64,
72
+ "n_shared_experts": 2,
73
+ "norm_topk_prob": false,
74
+ "num_attention_heads": 10,
75
+ "num_experts_per_tok": 6,
76
+ "num_hidden_layers": 12,
77
+ "num_key_value_heads": 10,
78
+ "pretraining_tp": 1,
79
+ "projector_config": {
80
+ "input_dim": 2048,
81
+ "model_type": "mlp_projector",
82
+ "n_embed": 1280,
83
+ "projector_type": "linear"
84
+ },
85
+ "q_lora_rank": null,
86
+ "qk_nope_head_dim": 0,
87
+ "qk_rope_head_dim": 0,
88
+ "rm_head": false,
89
+ "rms_norm_eps": 1e-06,
90
+ "rope_scaling": null,
91
+ "rope_theta": 10000.0,
92
+ "routed_scaling_factor": 1.0,
93
+ "scoring_func": "softmax",
94
+ "seq_aux": true,
95
+ "tie_word_embeddings": false,
96
+ "tile_tag": "2D",
97
+ "topk_group": 1,
98
+ "topk_method": "greedy",
99
+ "torch_dtype": "bfloat16",
100
+ "transformers_version": "4.46.3",
101
+ "use_cache": true,
102
+ "use_mla": false,
103
+ "v_head_dim": 0,
104
+ "vision_config": {
105
+ "image_size": 1024,
106
+ "mlp_ratio": 3.7362,
107
+ "model_name": "deeplip_b_l",
108
+ "model_type": "vision",
109
+ "width": {
110
+ "clip-l-14-224": {
111
+ "heads": 16,
112
+ "image_size": 224,
113
+ "layers": 24,
114
+ "patch_size": 14,
115
+ "width": 1024
116
+ },
117
+ "sam_vit_b": {
118
+ "downsample_channels": [
119
+ 512,
120
+ 1024
121
+ ],
122
+ "global_attn_indexes": [
123
+ 2,
124
+ 5,
125
+ 8,
126
+ 11
127
+ ],
128
+ "heads": 12,
129
+ "layers": 12,
130
+ "width": 768
131
+ }
132
+ }
133
+ },
134
+ "vocab_size": 129280
135
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 1,
5
+ "transformers_version": "4.46.3"
6
+ }
layer_configs.json ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66cfd58eaa2d4a6418f7573a9691f9c24515c7a147b26bbc443c0387d7f3c2f2
3
+ size 3515210600
quantization_info.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|User|>",
4
+ "<|Assistant|>"
5
+ ],
6
+ "bos_token": {
7
+ "content": "<|begin▁of▁sentence|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "eos_token": {
14
+ "content": "<|end▁of▁sentence|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "pad_token": {
21
+ "content": "<|▁pad▁|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ }
27
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff