GLM-OCR-Pruned-8bit

Model Size GPU Quant

Production GLM-OCR: 52% smaller (2.7GB→1.3GB), fully 8-bit, OCR optimized

πŸ“Š Performance

Metric Original Optimized
Parameters 1.1B 1.1B (4.3% pruned)
Disk 2.7GB 1.3GB (52%↓)
GPU 3.5GB+ 2.3GB
Speed 1x 2-3x

πŸš€ Quickstart

from transformers import BitsAndBytesConfig, AutoProcessor, AutoModelForImageTextToText
import torch

MODEL_PATH = "ManiKumarAdapala/glm-ocr-pruned-8bit"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "Image.jpeg"
            },
            {
                "type": "text",
                "text": "Text Recognition:"
            }
        ],
    }
]

quant_config = BitsAndBytesConfig(load_in_8bit=True)

processor = AutoProcessor.from_pretrained(MODEL_PATH)
model = AutoModelForImageTextToText.from_pretrained(
    pretrained_model_name_or_path=MODEL_PATH,
    quantization_config=quant_config,
    device_map="auto",
)

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)

inputs.pop("token_type_ids", None)

generated_ids = model.generate(**inputs, max_new_tokens=8192)

output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)

print(output_text)

πŸ›  Optimizations Applied

  • βœ… Selective Pruning: q_proj, v_proj, fc2, vision_tower (52%)
  • βœ… BitsAndBytes 8-bit: Linear8bitLt (vision+text decoder)
  • βœ… Protected: lm_head, early vision, final decoder layers

πŸ“š Citation

@misc{GLM-OCR-Pruned8bit-2026,
  author = {Mani, {ADAPALA MANI KUMAR} and {ZAI-org}},
  title = {GLM-OCR Pruned & 8-bit quantized (1.1B params, 4.3% sparsity)},
  year = {2026},
  month = {march},
  publisher = {Hugging Face},
  url = {https://huggingface.co/adapala-manikumar/glm-ocr-pruned-8bit},
  note = {1.3GB disk, 2.3GB GPU, OCR optimized, MIT}
}

Acknowledgements (from ZAI-org/GLM-OCR)

This project is inspired by the excellent work of:

License Notice: The GLM-OCR model is MIT licensed. When using the complete OCR pipeline, users should comply with Apache License 2.0 for PP-DocLayoutV3 components.

Downloads last month
61
Safetensors
Model size
1B params
Tensor type
F32
Β·
BF16
Β·
I8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ManiKumarAdapala/glm-ocr-pruned-8bit

Base model

zai-org/GLM-OCR
Quantized
(12)
this model