Instructions to use lightonai/LightOnOCR-2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lightonai/LightOnOCR-2-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="lightonai/LightOnOCR-2-1B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("lightonai/LightOnOCR-2-1B") model = AutoModelForSeq2SeqLM.from_pretrained("lightonai/LightOnOCR-2-1B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use lightonai/LightOnOCR-2-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lightonai/LightOnOCR-2-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/lightonai/LightOnOCR-2-1B
- SGLang
How to use lightonai/LightOnOCR-2-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lightonai/LightOnOCR-2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lightonai/LightOnOCR-2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use lightonai/LightOnOCR-2-1B with Docker Model Runner:
docker model run hf.co/lightonai/LightOnOCR-2-1B
Testing OCR using vllm returns nothing but exclamation marks.
I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.
I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.
Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.
I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.
Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.
I have a Turing GPU so no bf16 support, vllm automatically falls back to f16.
hello,
how about when using transformers? fp16 on T4 with vllm could be a vllm issue
hello,
how about when using transformers? fp16 on T4 with vllm could be a vllm issue
Using transformers does work but is obviously much much slower, I guess the issue might be with vllm even tho I can still run the first model just fine.
I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.
I also encountered the same problem.
I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.
I also encountered the same problem.
What GPU are you using ?
is it true for all the LightOnOCR-2 variants?
is it true for all the LightOnOCR-2 variants?
I tried ocr-soup and bbox variant to check and yes, same result on all 3 variants. I also tried to use other attention backends in vllm like 'TRITON_ATTN' and 'FLEX_ATTENTION' to no avail.
thats weird as we have used vLLM for all our evaluations and the demo is running on vLLM too for the first two models. is it the case for all samples or just a particular one?
thats weird as we have used vLLM for all our evaluations and the demo is running on vLLM too for the first two models. is it the case for all samples or just a particular one?
I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.
Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.
I have a Turing GPU so no bf16 support, vllm automatically falls back to f16.
this is the cause maybe? i am also encountering this issue on google colab with their T4 gpu which have no bf16 support
I also encountered this issue. I am using an Nvidia 4090 graphics card, and this is how I resolved it:
vllm serve $MODEL_NAME
--host 0.0.0.0
--port 12001
--gpu-memory-utilization 0.85
--max-model-len 10000
--tensor-parallel-size 1
--dtype bfloat16
--seed 2601
--limit-mm-per-prompt '{"image": 1}'
--mm-processor-cache-gb 0
--no-enable-prefix-caching
Key parameter change: --dtype bfloat16
I also encountered this issue. I am using an Nvidia 4090 graphics card, and this is how I resolved it:
vllm serve $MODEL_NAME
--host 0.0.0.0
--port 12001
--gpu-memory-utilization 0.85
--max-model-len 10000
--tensor-parallel-size 1
--dtype bfloat16
--seed 2601
--limit-mm-per-prompt '{"image": 1}'
--mm-processor-cache-gb 0
--no-enable-prefix-caching
Key parameter change: --dtype bfloat16
That makes sense since since your gpu supports bf16, the issue seems to very much be with gpus which do not. @staghado , have you tested vllm using Turing at all?
when i run with these settings on an A100 its actually slower than Dots.OCR. am i missing any settings? or do i need to choose a different GPU
We have conducted our benchmarks using Hopper GPUs. That being said, I don't expect GPU type to make a difference; could you share more about how are you benching: seq len, num requests sent in parallel, image resolution, etc?
here is a minimal vllm serve command:
vllm serve $MODEL_NAME \
--host 0.0.0.0 \
--port $VLLM_PORT \
--tensor-parallel-size 1 \
--limit-mm-per-prompt '{"image": 1}' \
--mm-processor-cache-gb 0 \
--no-enable-prefix-caching \
--max-model-len 8192