ManiKumarAdapala
/

glm-ocr-pruned-8bit

Image-Text-to-Text

8-bit precision

Model card Files Files and versions

glm-ocr-pruned-8bit / README.md

ManiKumarAdapala's picture

ManiKumarAdapala

updated readme.md

93cd307 verified 30 days ago

|

history blame contribute delete

3.03 kB

	---
	title: GLM-OCR Pruned 8-bit Safetensors (1.3GB)
	emoji: 🚀
	license: mit
	language:
	- en
	- fr
	- es
	- ru
	- de
	- ja
	- ko
	- zh
	base_model:
	- zai-org/GLM-OCR
	pipeline_tag: image-text-to-text
	library_name: transformers
	tags:
	- pruning
	- bitsandbytes
	- int8
	---

	# GLM-OCR-Pruned-8bit
	![Model Size](https://img.shields.io/badge/Disk-1.3GB-brightgreen) ![GPU](https://img.shields.io/badge/GPU-2.3GB-blue) ![Quant](https://img.shields.io/badge/8--bit-✅-orange)

	Production GLM-OCR: 52% smaller (2.7GB→1.3GB), fully 8-bit, OCR optimized

	## 📊 Performance
	\| Metric \| Original \| Optimized \|
	\|--------\|----------\|---------------\|
	\| Parameters \| 1.1B \| 1.1B (4.3% pruned) \|
	\| Disk \| 2.7GB \| 1.3GB (52%↓) \|
	\| GPU \| 3.5GB+ \| 2.3GB \|
	\| Speed \| 1x \| 2-3x \|

	## 🚀 Quickstart
	```python
	from transformers import BitsAndBytesConfig, AutoProcessor, AutoModelForImageTextToText
	import torch

	MODEL_PATH = "ManiKumarAdapala/glm-ocr-pruned-8bit"

	messages = [
	{
	"role": "user",
	"content": [
	{
	"type": "image",
	"url": "Image.jpeg"
	},
	{
	"type": "text",
	"text": "Text Recognition:"
	}
	],
	}
	]

	quant_config = BitsAndBytesConfig(load_in_8bit=True)

	processor = AutoProcessor.from_pretrained(MODEL_PATH)
	model = AutoModelForImageTextToText.from_pretrained(
	pretrained_model_name_or_path=MODEL_PATH,
	quantization_config=quant_config,
	device_map="auto",
	)

	inputs = processor.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_dict=True,
	return_tensors="pt"
	).to(model.device)

	inputs.pop("token_type_ids", None)

	generated_ids = model.generate(**inputs, max_new_tokens=8192)

	output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)

	print(output_text)
	```

	## 🛠 Optimizations Applied

	- ✅ Selective Pruning: q_proj, v_proj, fc2, vision_tower (52%)
	- ✅ BitsAndBytes 8-bit: Linear8bitLt (vision+text decoder)
	- ✅ Protected: lm_head, early vision, final decoder layers


	## 📚 Citation

	```bibtex
	@misc{GLM-OCR-Pruned8bit-2026,
	author = {Mani, {ADAPALA MANI KUMAR} and {ZAI-org}},
	title = {GLM-OCR Pruned & 8-bit quantized (1.1B params, 4.3% sparsity)},
	year = {2026},
	month = {march},
	publisher = {Hugging Face},
	url = {https://huggingface.co/adapala-manikumar/glm-ocr-pruned-8bit},
	note = {1.3GB disk, 2.3GB GPU, OCR optimized, MIT}
	}
	```

	<font size="2">

	Acknowledgements (from ZAI-org/GLM-OCR)

	This project is inspired by the excellent work of:
	- [PP-DocLayout-V3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3) (Apache 2.0)
	- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
	- [MinerU](https://github.com/opendatalab/MinerU)

	License Notice: The GLM-OCR model is MIT licensed. When using the complete OCR pipeline, users should comply with Apache License 2.0 for PP-DocLayoutV3 components.
	</font>