Testing OCR using vllm returns nothing but exclamation marks.

#1
by rsbdev - opened

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.

I have a Turing GPU so no bf16 support, vllm automatically falls back to f16.

LightOn AI org

hello,
how about when using transformers? fp16 on T4 with vllm could be a vllm issue

hello,
how about when using transformers? fp16 on T4 with vllm could be a vllm issue

Using transformers does work but is obviously much much slower, I guess the issue might be with vllm even tho I can still run the first model just fine.

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

I also encountered the same problem.

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

I also encountered the same problem.

What GPU are you using ?

LightOn AI org

is it true for all the LightOnOCR-2 variants?

is it true for all the LightOnOCR-2 variants?

I tried ocr-soup and bbox variant to check and yes, same result on all 3 variants. I also tried to use other attention backends in vllm like 'TRITON_ATTN' and 'FLEX_ATTENTION' to no avail.

LightOn AI org

thats weird as we have used vLLM for all our evaluations and the demo is running on vLLM too for the first two models. is it the case for all samples or just a particular one?

thats weird as we have used vLLM for all our evaluations and the demo is running on vLLM too for the first two models. is it the case for all samples or just a particular one?

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.

I have a Turing GPU so no bf16 support, vllm automatically falls back to f16.

this is the cause maybe? i am also encountering this issue on google colab with their T4 gpu which have no bf16 support

I also encountered this issue. I am using an Nvidia 4090 graphics card, and this is how I resolved it:
vllm serve $MODEL_NAME
--host 0.0.0.0
--port 12001
--gpu-memory-utilization 0.85
--max-model-len 10000
--tensor-parallel-size 1
--dtype bfloat16
--seed 2601
--limit-mm-per-prompt '{"image": 1}'
--mm-processor-cache-gb 0
--no-enable-prefix-caching
Key parameter change: --dtype bfloat16

I also encountered this issue. I am using an Nvidia 4090 graphics card, and this is how I resolved it:
vllm serve $MODEL_NAME
--host 0.0.0.0
--port 12001
--gpu-memory-utilization 0.85
--max-model-len 10000
--tensor-parallel-size 1
--dtype bfloat16
--seed 2601
--limit-mm-per-prompt '{"image": 1}'
--mm-processor-cache-gb 0
--no-enable-prefix-caching
Key parameter change: --dtype bfloat16

That makes sense since since your gpu supports bf16, the issue seems to very much be with gpus which do not. @staghado , have you tested vllm using Turing at all?

when i run with these settings on an A100 its actually slower than Dots.OCR. am i missing any settings? or do i need to choose a different GPU

LightOn AI org

We have conducted our benchmarks using Hopper GPUs. That being said, I don't expect GPU type to make a difference; could you share more about how are you benching: seq len, num requests sent in parallel, image resolution, etc?
here is a minimal vllm serve command:

    vllm serve $MODEL_NAME \
        --host 0.0.0.0 \
        --port $VLLM_PORT \
        --tensor-parallel-size 1 \
        --limit-mm-per-prompt '{"image": 1}' \
        --mm-processor-cache-gb 0 \
        --no-enable-prefix-caching \
        --max-model-len 8192

Sign up or log in to comment