Add link to paper
#42
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -50,6 +50,7 @@ dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
|
| 50 |
3. **Unified and Simple Architecture:** By leveraging a single vision-language model, **dots.ocr** offers a significantly more streamlined architecture than conventional methods that rely on complex, multi-model pipelines. Switching between tasks is accomplished simply by altering the input prompt, proving that a VLM can achieve competitive detection results compared to traditional detection models like DocLayout-YOLO.
|
| 51 |
4. **Efficient and Fast Performance:** Built upon a compact 1.7B LLM, **dots.ocr** provides faster inference speeds than many other high-performing models based on larger foundations.
|
| 52 |
|
|
|
|
| 53 |
|
| 54 |
## Usage with transformers
|
| 55 |
|
|
|
|
| 50 |
3. **Unified and Simple Architecture:** By leveraging a single vision-language model, **dots.ocr** offers a significantly more streamlined architecture than conventional methods that rely on complex, multi-model pipelines. Switching between tasks is accomplished simply by altering the input prompt, proving that a VLM can achieve competitive detection results compared to traditional detection models like DocLayout-YOLO.
|
| 51 |
4. **Efficient and Fast Performance:** Built upon a compact 1.7B LLM, **dots.ocr** provides faster inference speeds than many other high-performing models based on larger foundations.
|
| 52 |
|
| 53 |
+
It was introduced in the paper [dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model]((https://huggingface.co/papers/2512.02498)).
|
| 54 |
|
| 55 |
## Usage with transformers
|
| 56 |
|