SamMikaelson
/

OLD_INT_8

uniform-quantization

model-compression

8-bit precision

Model card Files Files and versions

OLD_INT_8 / README.md

SamMikaelson's picture

Upload folder using huggingface_hub

f34ca0c verified 15 days ago

|

history blame contribute delete

1.62 kB

	---
	license: mit
	base_model: deepseek-ai/DeepSeek-OCR
	tags:
	- quantization
	- int8
	- uniform-quantization
	- model-compression
	---

	# Uniform INT8 Quantized DeepSeek-OCR

	This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).

	## Quantization Details

	- Method: Uniform INT8 quantization
	- Quantized Layers: 2342
	- Vision Layers: 96 @ 8-bit
	- Language Layers: 2197 @ 8-bit
	- Average Bit-width: 8.00
	- Original Size: 6363.12 MB
	- Compressed Size: 3351.56 MB
	- Compression Ratio: 1.90x

	## Model Files

	- `quantized_weights.pt`: Quantized model weights
	- `quantization_info.json`: Layer-wise quantization configuration
	- `layer_configs.json`: Detailed layer configurations
	- `compression_stats.json`: Compression statistics
	- `layer_analysis.json`: Modality analysis (vision/language/other)

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)

	# Load quantized weights
	state_dict = torch.load("quantized_weights.pt")
	# Note: You'll need the QuantizedLinear class to properly load and use this model
	```

	## Baseline Characteristics

	This uniform quantization approach:
	- Applies the same 8-bit quantization to ALL layers
	- Does not distinguish between vision and language modalities
	- Serves as a baseline for comparison with modality-aware methods

	## Citation

	If you use this model, please cite the original model and mention the uniform quantization approach.