Instructions to use lightonai/LightOnOCR-2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lightonai/LightOnOCR-2-1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="lightonai/LightOnOCR-2-1B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForSeq2SeqLM

processor = AutoProcessor.from_pretrained("lightonai/LightOnOCR-2-1B")
model = AutoModelForSeq2SeqLM.from_pretrained("lightonai/LightOnOCR-2-1B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lightonai/LightOnOCR-2-1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lightonai/LightOnOCR-2-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/lightonai/LightOnOCR-2-1B

SGLang

How to use lightonai/LightOnOCR-2-1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lightonai/LightOnOCR-2-1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lightonai/LightOnOCR-2-1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use lightonai/LightOnOCR-2-1B with Docker Model Runner:
```
docker model run hf.co/lightonai/LightOnOCR-2-1B
```

Apple users fear not you can use MLX too!

#11

by pherber3 - opened Jan 22

Discussion

pherber3

Jan 22

•

edited Jan 23

If you are like me and love this model but use a Mac for work, and then got sad when you realized it took 2 hours to do a long PDF extraction and it was unusable - well that's ok because you can also use mlx-vlm as a backend! The model uses Pixtral inside, which has support, so the long PyTorch inference code can be condensed down to something as easy as:

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model, processor = load("lightonai/LightOnOCR-2-1B", fix_mistral_regex=True)

url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ocr/resolve/main/SROIE-receipt.jpeg"

prompt = apply_chat_template(processor, config=model.config, prompt="", num_images=1)

output = generate(model, processor, prompt, image=[url], max_tokens=1024, verbose=False)

print(output.text)

I did realize the Pixtral implementation on there wasn't using scaled DP attention so I just put together a PR for to update that which will hopefully be merged shortly, results in ~20% faster vision encoding.

In any case, a 2 hour PDF extraction turned into ~25 minutes using this backend instead. You can also easily quantize the model into 4 or 8 bits, I ended up doing some testing and found that with my use case I could use a 4 bit quantization, turn the scale down to 1 for the PDFs, and then spawn multiple workers to run in parallel (tried adding batched vision embedding support but it wasn't very effective, and then mlx just doesn't support paged attention obviously so not much to be done there) and got the overall conversion down to ~7 minutes with no practical quality deterioration either (most performance hits were in things like bolding text properly, using the right markdown heading level, or turning the table of content into an actual table - not really things I care about, performance on my little eval dataset of questions I asked about the document didn't change).

## a100_gpu vs mlx_parallel_opt
Common pages: 214

### Overall Similarity
- Character similarity: 94.05%
- Word similarity: 97.1%
- Line similarity: 76.8%
- Total chars: 523,063 vs 521,690
- Total words: 80,923 vs 81,078

Pretty incredible, it actually was ~1-2 minutes faster than the full precision and full scale model on an A100! Obviously, if I also reduced the quantization and scale then the A100 would probably blow past it but beggars can't be choosers those things are expensive for a reason. The quantization process is really easy with mlx-vlm but if there's any interest I could throw those versions I made up onto huggingface as well.

Anyways, thanks for the HTML support on this version, my use case is heavily around nested and incredibly ugly tables split across multiple pages which Markdown always struggled to reproduce. This has worked perfectly for me so far and enabled a new project idea with just a tiny bit of post-processing to connect split tables at lookup time, cheers to my goats at LightOn!

EDIT: Tried optimizing my A100 workflow and yeah it gets down to like 45 seconds for ~200 page documents so definitely not going to be beating that, but the fact that it's within an order of magnitude is still pretty good.

ameliechatelain

LightOn AI org Jan 23

🐐

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment