--- title: GLM-OCR Pruned 8-bit Safetensors (1.3GB) emoji: 🚀 license: mit language: - en - fr - es - ru - de - ja - ko - zh base_model: - zai-org/GLM-OCR pipeline_tag: image-text-to-text library_name: transformers tags: - pruning - bitsandbytes - int8 --- # GLM-OCR-Pruned-8bit ![Model Size](https://img.shields.io/badge/Disk-1.3GB-brightgreen) ![GPU](https://img.shields.io/badge/GPU-2.3GB-blue) ![Quant](https://img.shields.io/badge/8--bit-✅-orange) **Production GLM-OCR: 52% smaller (2.7GB→1.3GB), fully 8-bit, OCR optimized** ## 📊 Performance | Metric | Original | **Optimized** | |--------|----------|---------------| | **Parameters** | 1.1B | **1.1B (4.3% pruned)** | | **Disk** | 2.7GB | **1.3GB** (52%↓) | | **GPU** | 3.5GB+ | **2.3GB** | | **Speed** | 1x | **2-3x** | ## 🚀 Quickstart ```python from transformers import BitsAndBytesConfig, AutoProcessor, AutoModelForImageTextToText import torch MODEL_PATH = "ManiKumarAdapala/glm-ocr-pruned-8bit" messages = [ { "role": "user", "content": [ { "type": "image", "url": "Image.jpeg" }, { "type": "text", "text": "Text Recognition:" } ], } ] quant_config = BitsAndBytesConfig(load_in_8bit=True) processor = AutoProcessor.from_pretrained(MODEL_PATH) model = AutoModelForImageTextToText.from_pretrained( pretrained_model_name_or_path=MODEL_PATH, quantization_config=quant_config, device_map="auto", ) inputs = processor.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt" ).to(model.device) inputs.pop("token_type_ids", None) generated_ids = model.generate(**inputs, max_new_tokens=8192) output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False) print(output_text) ``` ## 🛠 Optimizations Applied - ✅ Selective Pruning: q_proj, v_proj, fc2, vision_tower (52%) - ✅ BitsAndBytes 8-bit: Linear8bitLt (vision+text decoder) - ✅ Protected: lm_head, early vision, final decoder layers ## 📚 Citation ```bibtex @misc{GLM-OCR-Pruned8bit-2026, author = {Mani, {ADAPALA MANI KUMAR} and {ZAI-org}}, title = {GLM-OCR Pruned & 8-bit quantized (1.1B params, 4.3% sparsity)}, year = {2026}, month = {march}, publisher = {Hugging Face}, url = {https://huggingface.co/adapala-manikumar/glm-ocr-pruned-8bit}, note = {1.3GB disk, 2.3GB GPU, OCR optimized, MIT} } ``` **Acknowledgements (from ZAI-org/GLM-OCR)** This project is inspired by the excellent work of: - [PP-DocLayout-V3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3) (Apache 2.0) - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - [MinerU](https://github.com/opendatalab/MinerU) **License Notice**: The GLM-OCR model is MIT licensed. When using the complete OCR pipeline, users should comply with Apache License 2.0 for PP-DocLayoutV3 components.