Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +33 -115
layer_analysis.json +0 -0
quantized_weights.pt +3 -0

README.md CHANGED Viewed

@@ -1,139 +1,57 @@
 ---
-language: en
-license: apache-2.0
 tags:
 - quantization
-- deepseek
-- ocr
-- document-understanding
-- random-quantization
-base_model: deepseek-ai/DeepSeek-OCR
-pipeline_tag: image-to-text
 ---
-# DeepSeek-OCR Random Quantized Model (Standalone)
-This is a **fully standalone randomly quantized** version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
-⚠️ **Note**: This model uses random quantization as a baseline for comparison. It is NOT optimized and will have significant quality degradation. This serves as a lower bound for intelligent quantization methods.
-## Model Details
-### Quantization Statistics
-- **Method**: Random Quantization (Baseline)
-- **Compression Ratio**: 1.90x
-- **Average Bit-Width**: 8.00 bits
 - **Original Size**: 6363.12 MB
 - **Compressed Size**: 3351.56 MB
-- **Size Reduction**: ~47.3%
-### Architecture
-Based on DeepSeek-OCR with custom `QuantizedLinear` layers that perform on-the-fly dequantization during inference.
-## Usage
-### Basic Loading
-```python
-from transformers import AutoModel, AutoTokenizer
-import torch
-# Load model and tokenizer (no base model needed!)
-model_name = "SamMikaelson/deepseek-ocr-int8-quantized"
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-model = AutoModel.from_pretrained(
-    model_name,
-    trust_remote_code=True,
-    torch_dtype=torch.bfloat16
-).to("cuda")
-# The model is ready to use!
-```
-### For Document OCR
 ```python
-from transformers import AutoProcessor
 import torch
-from PIL import Image
-# Load
-processor = AutoProcessor.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
-model = AutoModel.from_pretrained(
-    "SamMikaelson/deepseek-ocr-int8-quantized",
-    trust_remote_code=True,
-    torch_dtype=torch.bfloat16
-).to("cuda")
-# Inference
-image = Image.open("document.jpg")
-prompt = "<image>\n<|grounding|>Convert the document to markdown."
-# Process and generate
-inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda")
-outputs = model.generate(**inputs, max_length=2048)
-result = processor.decode(outputs[0], skip_special_tokens=True)
-print(result)
 ```
-## Performance Characteristics
-### Quality Metrics (Expected)
-- **NLS (Normalized Levenshtein Similarity)**: Significantly degraded (~0.01-0.1)
-- **WER (Word Error Rate)**: High error rate (20-50)
-- **Output Generation**: May produce nonsensical outputs due to random quantization
-### Speed Metrics
-- **Inference Latency**: Comparable to original (dequantization overhead)
-- **Memory Usage**: ~47.3% reduction
-## Limitations
-⚠️ **This is a baseline model for research purposes:**
-1. **Quality Degradation**: Random quantization severely impacts model quality
-2. **Not Production-Ready**: This model is for comparison/research only
-3. **Baseline Purpose**: Demonstrates the lower bound of quantization quality
-### Why This Model Exists
-This model serves as a **sanity check** and **lower bound** for intelligent quantization methods:
-- Shows what happens with no quantization intelligence
-- Provides a baseline to compare against optimized methods
-- Validates that your evaluation metrics can detect poor quantization
-## Better Alternatives
-For production use, consider:
-- **Sensitivity-aware quantization**: Quantize less important layers more aggressively
-- **Mixed-precision methods**: Use different bit-widths per layer based on importance
-- **Quantization-aware training**: Fine-tune after quantization
-- **GPTQ/AWQ**: State-of-the-art quantization methods
-## Files Included
-- `model.safetensors` or `pytorch_model.bin`: Complete model with quantized weights
-- `config.json`: Model configuration
-- `tokenizer.json`, `tokenizer_config.json`: Tokenizer files
-- `layer_configs.json`: Per-layer quantization settings
-- `quantization_info.json`: Quantization metadata
-- `compression_stats.json`: Compression statistics
 ## Citation
-```bibtex
-@misc{deepseek-ocr-random-quantized,
-  title={DeepSeek-OCR Random Quantized Model},
-  author={SamMikaelson},
-  year={2024},
-  publisher={Hugging Face},
-  howpublished={\url{https://huggingface.co/SamMikaelson/deepseek-ocr-int8-quantized}}
-}
-```
-Original DeepSeek-OCR model: [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
-## License
-Apache 2.0 (same as base model)

 ---
+license: mit
+base_model: deepseek-ai/DeepSeek-OCR
 tags:
 - quantization
+- int8
+- uniform-quantization
+- model-compression
 ---
+# Uniform INT8 Quantized DeepSeek-OCR
+This model is a uniformly quantized version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR).
+## Quantization Details
+- **Method**: Uniform INT8 quantization
+- **Quantized Layers**: 2342
+- **Vision Layers**: 96 @ 8-bit
+- **Language Layers**: 2197 @ 8-bit
+- **Average Bit-width**: 8.00
 - **Original Size**: 6363.12 MB
 - **Compressed Size**: 3351.56 MB
+- **Compression Ratio**: 1.90x
+## Model Files
+- `quantized_weights.pt`: Quantized model weights
+- `quantization_info.json`: Layer-wise quantization configuration
+- `layer_configs.json`: Detailed layer configurations
+- `compression_stats.json`: Compression statistics
+- `layer_analysis.json`: Modality analysis (vision/language/other)
+## Usage
 ```python
 import torch
+from transformers import AutoTokenizer
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
+# Load quantized weights
+state_dict = torch.load("quantized_weights.pt")
+# Note: You'll need the QuantizedLinear class to properly load and use this model
 ```
+## Baseline Characteristics
+This uniform quantization approach:
+- Applies the **same 8-bit** quantization to ALL layers
+- **Does not distinguish** between vision and language modalities
+- Serves as a **baseline** for comparison with modality-aware methods
 ## Citation
+If you use this model, please cite the original model and mention the uniform quantization approach.

layer_analysis.json ADDED Viewed

The diff for this file is too large to render. See raw diff

quantized_weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17858a6f6131abb66d810483239856f6df98249e477e079e1368aac7b1965ada
+size 3516781114