Throughput is slower than PadlleOCR-VL
Hello,
In your blog you claim a troughput two times faster than Paddle-OCR-VL.
But in my tests using VLLM for both models on a RTX 4000 Ada, your model is 3.3x times slower. (i process each page sequentially)
Were your tests made with the transformers backend or shoud it also hold with VLLM ?
Maybe i miss an optimization (i'm following the example on this page)
Thank you
I'm currently processing one page per ~10 seconds... I hope it's due to my local 5070 GPU. I'd like to test it with a rented H100
Hi,
For throughput, we usually aim to maximize GPU memory utilization, doing it sequentially(1 page at a time) would hurt throughput. So in order to increase throughput you'd need to send multiple requests to the vLLM server. It is somewhat expected that pipeline based approaches(such as PadlleOCR-VL) would be better in the case of 1 page but the overheads sum up in the high batch size regime.
Check the v1 blog which details exactly how the experiments were run for all models.
