Image-Text-to-Text
Transformers
Safetensors
mistral3
text-generation
ocr
document-understanding
vision-language
pdf
tables
forms
conversational
Eval Results
🇪🇺 Region: EU
Instructions to use lightonai/LightOnOCR-2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lightonai/LightOnOCR-2-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="lightonai/LightOnOCR-2-1B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("lightonai/LightOnOCR-2-1B") model = AutoModelForSeq2SeqLM.from_pretrained("lightonai/LightOnOCR-2-1B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use lightonai/LightOnOCR-2-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lightonai/LightOnOCR-2-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/lightonai/LightOnOCR-2-1B
- SGLang
How to use lightonai/LightOnOCR-2-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lightonai/LightOnOCR-2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lightonai/LightOnOCR-2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use lightonai/LightOnOCR-2-1B with Docker Model Runner:
docker model run hf.co/lightonai/LightOnOCR-2-1B
Layout with bbox
#41 opened 16 days ago
by
ariefwijaya
Built a real-time screen reader using LightOnOCR-2-1B
🔥 4
#36 opened about 1 month ago
by
paradisecy
Convert to ONNX
#35 opened about 2 months ago
by
johnlockejrr
Extremely slow speed (>1min for one page)
1
#34 opened about 2 months ago
by
jojo71
Excessive use of HTML in Output
❤️ 1
#33 opened 2 months ago
by
abimaelmartell
RLVR Strategy Suggestions
#30 opened 3 months ago
by
TheOfficialAJ
Accuracy decreases a lot in gguf conversion
👍 2
2
#28 opened 3 months ago
by
fhaDL
How it is compared in bahchmarks with new model Paddle ocr 1.5 ?
#27 opened 3 months ago
by
dibu28
How do you calculate inference speed on OlmOCR-Bench?
👍 1
2
#26 opened 3 months ago
by
Piperino
Throughput is slower than PadlleOCR-VL
👍 1
2
#25 opened 3 months ago
by
TechNetiums
Regarding reasoning capability on image
#24 opened 3 months ago
by
Ajayan
The exact vocab size of the model
2
#22 opened 4 months ago
by
abdullahamlwakeb
Future suggestion about using available text layers
👍 2
#21 opened 4 months ago
by
jondecker76
Prompt of Demo web
1
#20 opened 4 months ago
by
vanhdz2611
whats the best settings to run using vllm at very high concurrency
➕ 3
#19 opened 4 months ago
by
markwitt1
Fine-tuning for table bounding box extraction
1
#18 opened 4 months ago
by
TheOfficialAJ
Fine-tune for structured extraction
👍 1
2
#17 opened 4 months ago
by
Glider95
x,y coordinates
🚀 1
2
#16 opened 4 months ago
by
Superdooperhero
Incredible Model, Thank You!
❤️ 4
2
#15 opened 4 months ago
by
md-1415
Amazing model
👍❤️ 2
#14 opened 4 months ago
by
leniad
Experience on finetuning for specific language
2
#13 opened 4 months ago
by
Huy227
Fix typo in model architecture class name
2
#12 opened 4 months ago
by
Xenova
Apple users fear not you can use MLX too!
🚀🔥 6
1
#11 opened 4 months ago
by
pherber3
LightOnOCR-2-1B with Samaritan Hebrew
❤️ 3
#10 opened 4 months ago
by
johnlockejrr
any way to get a shrunk output to reduce tokens?
3
#9 opened 4 months ago
by
Mr-Vader
Muliltipage documents
👍 2
2
#8 opened 4 months ago
by
jondecker76
Tables as markdown instead of HTML
👍 2
5
#6 opened 4 months ago
by
jondecker76
Remove page numbers from the output
1
#5 opened 4 months ago
by
Matthijz98
hallucinations on empty pages
7
#4 opened 4 months ago
by
elenapop
Checkboxes
🚀 1
#2 opened 4 months ago
by
elenapop
Testing OCR using vllm returns nothing but exclamation marks.
14
#1 opened 4 months ago
by
rsbdev