GLM-OCR very slow on Tesla T4 (~40s per image) even with GPU — is this expected?

#13

by 905saini - opened Feb 6

Discussion

905saini

Feb 6

Hi,

I’m testing GLM-OCR on Google Colab with a Tesla T4 (15GB VRAM).

Setup:

Model: zai-org/GLM-OCR
Image size : 1024X1024
max_new_tokens :2048
GPU utilization ~60%, VRAM ~4.4GB

However, inference time is still ~40 seconds per image:

Questions:

Is ~40–50s on T4 expected for GLM-OCR?
Any recommended settings for faster inference ?

iyuge2

Z.ai org Feb 25

@905saini Hi, thanks for sharing the detailed setup and metrics — that’s very helpful.

We haven’t test GLM-OCR on a T4 GPU yet, so we don’t have an official reference for the expected latency in this configuration.

Could you let us know which inference framework you’re currently using?
For example: Transformers, vLLM, SGLang, or Ollama?

iyuge2

Z.ai org Feb 25

If possible, you can also share the image that you’re testing with. We can run it on our side to better understand the latency and give more specific feedback.

Bakanayatsu

Mar 1

Hi,

I’m testing GLM-OCR on Google Colab with a Tesla T4 (15GB VRAM).

Setup:

Model: zai-org/GLM-OCR
Image size : 1024X1024
max_new_tokens :2048
GPU utilization ~60%, VRAM ~4.4GB

However, inference time is still ~40 seconds per image:

Questions:

Is ~40–50s on T4 expected for GLM-OCR?

Any recommended settings for faster inference ?

T4 doesn't support bf16, are you using fp16?

905saini

Apr 2

•

edited Apr 3

@iyuge2 , @Bakanayatsu
Hi,
I have used Transformers setup and my data is normal tabular data.
like below attached image.

Also i have used bf16.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment