wangjazz commited on
Commit
783aa2f
·
verified ·
1 Parent(s): 89b96e1

Upload 2 files

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. LightOnOCR-2-1B-Q4_K_M.gguf +3 -0
  3. README.md +146 -3
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ LightOnOCR-2-1B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
LightOnOCR-2-1B-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26cf61864df5b3639d8af791ae418bb59bf94d65b57422f2f174b072ea0ba049
3
+ size 396701312
README.md CHANGED
@@ -1,3 +1,146 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: lightonai/LightOnOCR-2-1B
4
+ tags:
5
+ - ocr
6
+ - document-understanding
7
+ - vision-language
8
+ - gguf
9
+ - llama.cpp
10
+ - multimodal
11
+ language:
12
+ - en
13
+ - fr
14
+ - de
15
+ - es
16
+ - it
17
+ - nl
18
+ - pt
19
+ - sv
20
+ - da
21
+ - zh
22
+ - ja
23
+ library_name: gguf
24
+ pipeline_tag: image-text-to-text
25
+ ---
26
+
27
+ # LightOnOCR-2-1B GGUF
28
+
29
+ GGUF quantized versions of [lightonai/LightOnOCR-2-1B](https://huggingface.co/lightonai/LightOnOCR-2-1B) for use with [llama.cpp](https://github.com/ggml-org/llama.cpp).
30
+
31
+ ## Model Description
32
+
33
+ LightOnOCR-2-1B is a 1B-parameter end-to-end vision-language model for OCR, converting documents (PDFs, scans, images) into clean, naturally ordered text.
34
+
35
+ ### Highlights
36
+
37
+ - **Speed:** 3.3× faster than Chandra OCR, 1.7× faster than OlmOCR
38
+ - **Efficiency:** <$0.01 per 1,000 pages on H100
39
+ - **End-to-End:** Fully differentiable, no external OCR pipeline
40
+ - **Versatile:** Handles tables, receipts, forms, multi-column layouts, and math notation
41
+
42
+ ## Available Files
43
+
44
+ | File | Size | Description |
45
+ |------|------|-------------|
46
+ | `LightOnOCR-2-1B-f16.gguf` | 1.1 GB | Language model (F16, highest quality) |
47
+ | `LightOnOCR-2-1B-Q8_0.gguf` | 610 MB | Language model (Q8_0, near-lossless) |
48
+ | `LightOnOCR-2-1B-Q4_K_M.gguf` | 378 MB | Language model (Q4_K_M, balanced) |
49
+ | `LightOnOCR-2-1B-mmproj-f16.gguf` | 781 MB | Vision encoder + projector (required) |
50
+
51
+ > **Note:** The vision encoder (`mmproj`) should NOT be quantized as it significantly impacts image understanding quality.
52
+
53
+ ## Usage with llama.cpp
54
+
55
+ ### Build llama.cpp
56
+
57
+ ```bash
58
+ git clone https://github.com/ggml-org/llama.cpp
59
+ cd llama.cpp
60
+ cmake -B build
61
+ cmake --build build --config Release
62
+ ```
63
+
64
+ ### Run OCR
65
+
66
+ ```bash
67
+ # Using F16 (highest quality)
68
+ ./build/bin/llama-mtmd-cli \
69
+ -m LightOnOCR-2-1B-f16.gguf \
70
+ --mmproj LightOnOCR-2-1B-mmproj-f16.gguf \
71
+ --image your-document.png \
72
+ -ngl 99 \
73
+ -c 4096 \
74
+ -n 1000 \
75
+ --temp 0.2 \
76
+ --repeat-penalty 1.15 \
77
+ --repeat-last-n 128 \
78
+ -p "OCR this image"
79
+
80
+ # Using Q4_K_M (smaller, faster)
81
+ ./build/bin/llama-mtmd-cli \
82
+ -m LightOnOCR-2-1B-Q4_K_M.gguf \
83
+ --mmproj LightOnOCR-2-1B-mmproj-f16.gguf \
84
+ --image your-document.png \
85
+ -ngl 99 \
86
+ -c 4096 \
87
+ -n 1000 \
88
+ --temp 0.2 \
89
+ --repeat-penalty 1.15 \
90
+ -p "OCR this image"
91
+ ```
92
+
93
+ ## Recommended Parameters
94
+
95
+ | Parameter | Value | Description |
96
+ |-----------|-------|-------------|
97
+ | `--temp` | 0.2 | Official recommended temperature |
98
+ | `--repeat-penalty` | 1.15 | Prevents repetition (1.1-1.2 optimal) |
99
+ | `--repeat-last-n` | 128 | Tokens to consider for penalty |
100
+ | `-n` | 1000 | Max output tokens (avoid >1500) |
101
+ | `-ngl` | 99 | GPU layers (use all for best speed) |
102
+
103
+ ### Parameter Notes
104
+
105
+ - **repeat-penalty**: Values above 1.2 may reduce OCR quality
106
+ - **-n (max tokens)**: Limiting to ~1000 prevents repetition at end of long documents
107
+ - **Image preprocessing**: Render PDFs to PNG at 1540px longest edge
108
+
109
+ ## Performance (Apple M4 Max)
110
+
111
+ | Metric | Value |
112
+ |--------|-------|
113
+ | Image encoding | ~435 ms |
114
+ | Image decoding | ~45 ms |
115
+ | Prompt processing | ~1,850 tokens/s |
116
+ | Text generation | ~228 tokens/s |
117
+ | Total time (1000 tokens) | ~8-10 sec |
118
+
119
+ ## Quantization Details
120
+
121
+ | Format | Bits/Weight | Size Reduction | Quality Impact |
122
+ |--------|-------------|----------------|----------------|
123
+ | F16 | 16 | - | Baseline |
124
+ | Q8_0 | 8 | 45% | Nearly lossless |
125
+ | Q4_K_M | 4.5 | 66% | Minimal |
126
+
127
+ ## Credits
128
+
129
+ - Original model: [lightonai/LightOnOCR-2-1B](https://huggingface.co/lightonai/LightOnOCR-2-1B)
130
+ - GGUF conversion: Using [llama.cpp](https://github.com/ggml-org/llama.cpp) convert tools
131
+ - Paper: [LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model](https://arxiv.org/pdf/2601.14251)
132
+
133
+ ## License
134
+
135
+ Apache License 2.0 (same as original model)
136
+
137
+ ## Citation
138
+
139
+ ```bibtex
140
+ @misc{lightonocr2_2026,
141
+ title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
142
+ author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
143
+ year = {2026},
144
+ howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
145
+ }
146
+ ```