--- license: mit base_model: deepseek-ai/DeepSeek-OCR tags: - quantization - int8 - uniform-quantization - model-compression --- # Uniform INT8 Quantized DeepSeek-OCR This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR). ## Quantization Details - **Method**: Uniform INT8 quantization - **Quantized Layers**: 2342 - **Vision Layers**: 96 @ 8-bit - **Language Layers**: 2197 @ 8-bit - **Average Bit-width**: 8.00 - **Original Size**: 6363.12 MB - **Compressed Size**: 3351.56 MB - **Compression Ratio**: 1.90x ## Model Files - `quantized_weights.pt`: Quantized model weights - `quantization_info.json`: Layer-wise quantization configuration - `layer_configs.json`: Detailed layer configurations - `compression_stats.json`: Compression statistics - `layer_analysis.json`: Modality analysis (vision/language/other) ## Usage ```python import torch from transformers import AutoTokenizer # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True) # Load quantized weights state_dict = torch.load("quantized_weights.pt") # Note: You'll need the QuantizedLinear class to properly load and use this model ``` ## Baseline Characteristics This uniform quantization approach: - Applies the **same 8-bit** quantization to ALL layers - **Does not distinguish** between vision and language modalities - Serves as a **baseline** for comparison with modality-aware methods ## Citation If you use this model, please cite the original model and mention the uniform quantization approach.