GLM-OCR very slow on Tesla T4 (~40s per image) even with GPU β€” is this expected?

#13
by 905saini - opened

Hi,

I’m testing GLM-OCR on Google Colab with a Tesla T4 (15GB VRAM).

Setup:

Model: zai-org/GLM-OCR
Image size : 1024X1024
max_new_tokens :2048
GPU utilization ~60%, VRAM ~4.4GB

However, inference time is still ~40 seconds per image:

Questions:

  1. Is ~40–50s on T4 expected for GLM-OCR?
  2. Any recommended settings for faster inference ?

Sign up or log in to comment