| --- |
| title: GLM-OCR Pruned 8-bit Safetensors (1.3GB) |
| emoji: π |
| license: mit |
| language: |
| - en |
| - fr |
| - es |
| - ru |
| - de |
| - ja |
| - ko |
| - zh |
| base_model: |
| - zai-org/GLM-OCR |
| pipeline_tag: image-text-to-text |
| library_name: transformers |
| tags: |
| - pruning |
| - bitsandbytes |
| - int8 |
| --- |
| |
| # GLM-OCR-Pruned-8bit |
|    |
|
|
| **Production GLM-OCR: 52% smaller (2.7GBβ1.3GB), fully 8-bit, OCR optimized** |
|
|
| ## π Performance |
| | Metric | Original | **Optimized** | |
| |--------|----------|---------------| |
| | **Parameters** | 1.1B | **1.1B (4.3% pruned)** | |
| | **Disk** | 2.7GB | **1.3GB** (52%β) | |
| | **GPU** | 3.5GB+ | **2.3GB** | |
| | **Speed** | 1x | **2-3x** | |
|
|
| ## π Quickstart |
| ```python |
| from transformers import BitsAndBytesConfig, AutoProcessor, AutoModelForImageTextToText |
| import torch |
| |
| MODEL_PATH = "ManiKumarAdapala/glm-ocr-pruned-8bit" |
| |
| messages = [ |
| { |
| "role": "user", |
| "content": [ |
| { |
| "type": "image", |
| "url": "Image.jpeg" |
| }, |
| { |
| "type": "text", |
| "text": "Text Recognition:" |
| } |
| ], |
| } |
| ] |
| |
| quant_config = BitsAndBytesConfig(load_in_8bit=True) |
| |
| processor = AutoProcessor.from_pretrained(MODEL_PATH) |
| model = AutoModelForImageTextToText.from_pretrained( |
| pretrained_model_name_or_path=MODEL_PATH, |
| quantization_config=quant_config, |
| device_map="auto", |
| ) |
| |
| inputs = processor.apply_chat_template( |
| messages, |
| tokenize=True, |
| add_generation_prompt=True, |
| return_dict=True, |
| return_tensors="pt" |
| ).to(model.device) |
| |
| inputs.pop("token_type_ids", None) |
| |
| generated_ids = model.generate(**inputs, max_new_tokens=8192) |
| |
| output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False) |
| |
| print(output_text) |
| ``` |
|
|
| ## π Optimizations Applied |
|
|
| - β
Selective Pruning: q_proj, v_proj, fc2, vision_tower (52%) |
| - β
BitsAndBytes 8-bit: Linear8bitLt (vision+text decoder) |
| - β
Protected: lm_head, early vision, final decoder layers |
|
|
|
|
| ## π Citation |
|
|
| ```bibtex |
| @misc{GLM-OCR-Pruned8bit-2026, |
| author = {Mani, {ADAPALA MANI KUMAR} and {ZAI-org}}, |
| title = {GLM-OCR Pruned & 8-bit quantized (1.1B params, 4.3% sparsity)}, |
| year = {2026}, |
| month = {march}, |
| publisher = {Hugging Face}, |
| url = {https://huggingface.co/adapala-manikumar/glm-ocr-pruned-8bit}, |
| note = {1.3GB disk, 2.3GB GPU, OCR optimized, MIT} |
| } |
| ``` |
|
|
| <font size="2"> |
| |
| **Acknowledgements (from ZAI-org/GLM-OCR)** |
|
|
| This project is inspired by the excellent work of: |
| - [PP-DocLayout-V3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3) (Apache 2.0) |
| - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) |
| - [MinerU](https://github.com/opendatalab/MinerU) |
|
|
| **License Notice**: The GLM-OCR model is MIT licensed. When using the complete OCR pipeline, users should comply with Apache License 2.0 for PP-DocLayoutV3 components. |
| </font> |