Instructions to use lightonai/LightOnOCR-2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lightonai/LightOnOCR-2-1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="lightonai/LightOnOCR-2-1B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForSeq2SeqLM

processor = AutoProcessor.from_pretrained("lightonai/LightOnOCR-2-1B")
model = AutoModelForSeq2SeqLM.from_pretrained("lightonai/LightOnOCR-2-1B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use lightonai/LightOnOCR-2-1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lightonai/LightOnOCR-2-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/lightonai/LightOnOCR-2-1B

SGLang

How to use lightonai/LightOnOCR-2-1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lightonai/LightOnOCR-2-1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lightonai/LightOnOCR-2-1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use lightonai/LightOnOCR-2-1B with Docker Model Runner:
```
docker model run hf.co/lightonai/LightOnOCR-2-1B
```

Testing OCR using vllm returns nothing but exclamation marks.

by rsbdev - opened Jan 19

Discussion

rsbdev

Jan 19

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

twoxfh

Jan 20

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.

rsbdev

Jan 20

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.

I have a Turing GPU so no bf16 support, vllm automatically falls back to f16.

staghado

LightOn AI org Jan 20

hello,
how about when using transformers? fp16 on T4 with vllm could be a vllm issue

rsbdev

Jan 20

hello,
how about when using transformers? fp16 on T4 with vllm could be a vllm issue

Using transformers does work but is obviously much much slower, I guess the issue might be with vllm even tho I can still run the first model just fine.

Xiakj

Jan 21

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

I also encountered the same problem.

rsbdev

Jan 21

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

I also encountered the same problem.

What GPU are you using ?

staghado

LightOn AI org Jan 21

is it true for all the LightOnOCR-2 variants?

rsbdev

Jan 21

•

edited Jan 21

is it true for all the LightOnOCR-2 variants?

I tried ocr-soup and bbox variant to check and yes, same result on all 3 variants. I also tried to use other attention backends in vllm like 'TRITON_ATTN' and 'FLEX_ATTENTION' to no avail.

staghado

LightOn AI org Jan 22

thats weird as we have used vLLM for all our evaluations and the demo is running on vLLM too for the first two models. is it the case for all samples or just a particular one?

viennadd

Jan 23

thats weird as we have used vLLM for all our evaluations and the demo is running on vLLM too for the first two models. is it the case for all samples or just a particular one?

I found the first version of this model to be quite useful so I was excited to try out the newest version to see the improvement in OCR quality, unfortunately I cant get it to work using the provided vllm usage example, whether it's the PDF provided in the test code or my own local files, the OCR output is always just a bunch of "!!!!!!!!!!!!!!!!!" and nothing else.

Are you sure your using the bf16 settings and not f16? Thats what usually happens when its not correct.

I have a Turing GPU so no bf16 support, vllm automatically falls back to f16.

this is the cause maybe? i am also encountering this issue on google colab with their T4 gpu which have no bf16 support

Charlie33

Jan 25

I also encountered this issue. I am using an Nvidia 4090 graphics card, and this is how I resolved it:
vllm serve $MODEL_NAME
--host 0.0.0.0
--port 12001
--gpu-memory-utilization 0.85
--max-model-len 10000
--tensor-parallel-size 1
--dtype bfloat16
--seed 2601
--limit-mm-per-prompt '{"image": 1}'
--mm-processor-cache-gb 0
--no-enable-prefix-caching
Key parameter change: --dtype bfloat16

rsbdev

Jan 25

I also encountered this issue. I am using an Nvidia 4090 graphics card, and this is how I resolved it:
vllm serve $MODEL_NAME
--host 0.0.0.0
--port 12001
--gpu-memory-utilization 0.85
--max-model-len 10000
--tensor-parallel-size 1
--dtype bfloat16
--seed 2601
--limit-mm-per-prompt '{"image": 1}'
--mm-processor-cache-gb 0
--no-enable-prefix-caching
Key parameter change: --dtype bfloat16

That makes sense since since your gpu supports bf16, the issue seems to very much be with gpus which do not. @staghado , have you tested vllm using Turing at all?

markwitt1

Jan 28

when i run with these settings on an A100 its actually slower than Dots.OCR. am i missing any settings? or do i need to choose a different GPU

staghado

LightOn AI org Jan 28

We have conducted our benchmarks using Hopper GPUs. That being said, I don't expect GPU type to make a difference; could you share more about how are you benching: seq len, num requests sent in parallel, image resolution, etc?
here is a minimal vllm serve command:

    vllm serve $MODEL_NAME \
        --host 0.0.0.0 \
        --port $VLLM_PORT \
        --tensor-parallel-size 1 \
        --limit-mm-per-prompt '{"image": 1}' \
        --mm-processor-cache-gb 0 \
        --no-enable-prefix-caching \
        --max-model-len 8192

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment