Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

README.md +57 -0
compression_stats.json +13 -0
layer_analysis.json +0 -0
layer_configs.json +0 -0
quantization_info.json +0 -0
quantized_weights.pt +3 -0
special_tokens_map.json +27 -0
tokenizer.json +0 -0
tokenizer_config.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+license: mit
+base_model: deepseek-ai/DeepSeek-OCR
+tags:
+- quantization
+- int8
+- uniform-quantization
+- model-compression
+---
+# Uniform INT8 Quantized DeepSeek-OCR
+This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
+## Quantization Details
+- **Method**: Uniform INT8 quantization
+- **Quantized Layers**: 2342
+- **Vision Layers**: 96 @ 8-bit
+- **Language Layers**: 2197 @ 8-bit
+- **Average Bit-width**: 8.00
+- **Original Size**: 6363.12 MB
+- **Compressed Size**: 3351.56 MB
+- **Compression Ratio**: 1.90x
+## Model Files
+- `quantized_weights.pt`: Quantized model weights
+- `quantization_info.json`: Layer-wise quantization configuration
+- `layer_configs.json`: Detailed layer configurations
+- `compression_stats.json`: Compression statistics
+- `layer_analysis.json`: Modality analysis (vision/language/other)
+## Usage
+```python
+import torch
+from transformers import AutoTokenizer
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
+# Load quantized weights
+state_dict = torch.load("quantized_weights.pt")
+# Note: You'll need the QuantizedLinear class to properly load and use this model
+```
+## Baseline Characteristics
+This uniform quantization approach:
+- Applies the **same 8-bit** quantization to ALL layers
+- **Does not distinguish** between vision and language modalities
+- Serves as a **baseline** for comparison with modality-aware methods
+## Citation
+If you use this model, please cite the original model and mention the uniform quantization approach.

compression_stats.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "original_params": 3336106240,
+  "quantized_layers": 2342,
+  "uniform_bits": 8,
+  "avg_bit_width": 8.0,
+  "original_size_mb": 6363.11767578125,
+  "compressed_size_mb": 3351.557418823242,
+  "compression_ratio": 1.898555471568018,
+  "vision_layers_quantized": 96,
+  "language_layers_quantized": 2197,
+  "actual_size_reduction": true,
+  "method": "uniform"
+}

layer_analysis.json ADDED Viewed

The diff for this file is too large to render. See raw diff

layer_configs.json ADDED Viewed

The diff for this file is too large to render. See raw diff

quantization_info.json ADDED Viewed

The diff for this file is too large to render. See raw diff

quantized_weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17858a6f6131abb66d810483239856f6df98249e477e079e1368aac7b1965ada
+size 3516781114

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "additional_special_tokens": [
+    "<|User|>",
+    "<|Assistant|>"
+  ],
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<｜▁pad▁｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff