GLM-OCR very slow on Tesla T4 (~40s per image) even with GPU β is this expected?
#13
by
905saini
- opened
Hi,
Iβm testing GLM-OCR on Google Colab with a Tesla T4 (15GB VRAM).
Setup:
Model: zai-org/GLM-OCR
Image size : 1024X1024
max_new_tokens :2048
GPU utilization ~60%, VRAM ~4.4GB
However, inference time is still ~40 seconds per image:
Questions:
- Is ~40β50s on T4 expected for GLM-OCR?
- Any recommended settings for faster inference ?