Throughput is slower than PadlleOCR-VL

#25
by TechNetiums - opened

Hello,
In your blog you claim a troughput two times faster than Paddle-OCR-VL.

image

But in my tests using VLLM for both models on a RTX 4000 Ada, your model is 3.3x times slower. (i process each page sequentially)

Were your tests made with the transformers backend or shoud it also hold with VLLM ?
Maybe i miss an optimization (i'm following the example on this page)

Thank you

I'm currently processing one page per ~10 seconds... I hope it's due to my local 5070 GPU. I'd like to test it with a rented H100

LightOn AI org

Hi,
For throughput, we usually aim to maximize GPU memory utilization, doing it sequentially(1 page at a time) would hurt throughput. So in order to increase throughput you'd need to send multiple requests to the vLLM server. It is somewhat expected that pipeline based approaches(such as PadlleOCR-VL) would be better in the case of 1 page but the overheads sum up in the high batch size regime.
Check the v1 blog which details exactly how the experiments were run for all models.

Sign up or log in to comment