Bapt120 commited on
Commit
4f0ee23
Β·
verified Β·
1 Parent(s): 9641ef2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -31,9 +31,13 @@ tags:
31
 
32
  **Best OCR model (recommended).** LightOnOCR-2-1B is our flagship OCR model, refined with RLVR training for maximum accuracy. We recommend this variant for most OCR tasks.
33
 
 
 
 
 
34
  ## Highlights
35
 
36
- * ⚑ **Speed:** 5Γ— faster than dots.ocr, 2Γ— faster than PaddleOCR-VL-0.9B, 1.73Γ— faster than DeepSeekOCR
37
  * πŸ’Έ **Efficiency:** Processes 5.71 pages/s on a single H100 (~493k pages/day) for **<$0.01 per 1,000 pages**
38
  * 🧠 **End-to-End:** Fully differentiable, no external OCR pipeline
39
  * 🧾 **Versatile:** Handles tables, receipts, forms, multi-column layouts, and math notation
@@ -41,7 +45,7 @@ tags:
41
 
42
  ---
43
 
44
- πŸ“„ **[Paper](https://huggingface.co/papers/lightonocr-2)** | πŸ“ **[Blog Post](https://huggingface.co/blog/lightonai/lightonocr-2)** | πŸš€ **[Demo](https://huggingface.co/spaces/lightonai/LightOnOCR-2-Demo)** | πŸ“Š **[Dataset](https://huggingface.co/datasets/lightonai/LightOnOCR-mix-0126)**
45
 
46
  ---
47
 
@@ -61,7 +65,7 @@ tags:
61
  ## Benchmarks
62
 
63
  <div align="center">
64
- <img src="benchmark_placeholder.png" alt="OlmOCR-Bench Results" width="900"/>
65
  </div>
66
 
67
  *See the [paper](https://huggingface.co/papers/lightonocr-2) for full benchmark details and methodology.*
 
31
 
32
  **Best OCR model (recommended).** LightOnOCR-2-1B is our flagship OCR model, refined with RLVR training for maximum accuracy. We recommend this variant for most OCR tasks.
33
 
34
+ ## About LightOnOCR-2
35
+
36
+ LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9Γ— smaller and significantly faster than competing approaches.
37
+
38
  ## Highlights
39
 
40
+ * ⚑ **Speed:** 3.3Γ— faster than Chandra OCR, 1.7Γ— faster than OlmOCR, 5Γ— faster than dots.ocr, 2Γ— faster than PaddleOCR-VL-0.9B, 1.73Γ— faster than DeepSeekOCR
41
  * πŸ’Έ **Efficiency:** Processes 5.71 pages/s on a single H100 (~493k pages/day) for **<$0.01 per 1,000 pages**
42
  * 🧠 **End-to-End:** Fully differentiable, no external OCR pipeline
43
  * 🧾 **Versatile:** Handles tables, receipts, forms, multi-column layouts, and math notation
 
45
 
46
  ---
47
 
48
+ πŸ“„ **[Paper](https://huggingface.co/papers/lightonocr-2)** | πŸ“ **[Blog Post](https://huggingface.co/blog/lightonai/lightonocr-2)** | πŸš€ **[Demo](https://huggingface.co/spaces/lightonai/LightOnOCR-2-1B-Demo)** | πŸ“Š **[Dataset](https://huggingface.co/datasets/lightonai/LightOnOCR-mix-0126)**
49
 
50
  ---
51
 
 
65
  ## Benchmarks
66
 
67
  <div align="center">
68
+ <img src="benchmark.png" alt="OlmOCR-Bench Results" width="900"/>
69
  </div>
70
 
71
  *See the [paper](https://huggingface.co/papers/lightonocr-2) for full benchmark details and methodology.*