Update README.md
Browse files
README.md
CHANGED
|
@@ -30,13 +30,13 @@ pipeline_tag: text-generation
|
|
| 30 |
# Reasoning comes to OCR 🧠✨📄🤘
|
| 31 |
|
| 32 |
**NuMarkdown-8B-Thinking** is the first reasoning OCR VLM. It is specifically trained to convert documents into clean GitHub-flavoured Markdown. It generates thoughts tokens to figure out the layout of the document before generating the Markdown file.
|
| 33 |
-
It is particularly good at understanding documents with weird
|
| 34 |
|
| 35 |
-
**NuMarkdown-8B-Thinking** is a fine-tune of **Qwen 2.5-VL-7B**
|
| 36 |
|
| 37 |
## Results
|
| 38 |
|
| 39 |
-
**NuMarkdown-8B-Thinking** is outperforming generic non-reasoning models like GPT-4o and specialized
|
| 40 |
It is competitive against large reasoning closed-source models like Gemini 2.5.
|
| 41 |
|
| 42 |
### Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 model-anonymized votes):
|
|
|
|
| 30 |
# Reasoning comes to OCR 🧠✨📄🤘
|
| 31 |
|
| 32 |
**NuMarkdown-8B-Thinking** is the first reasoning OCR VLM. It is specifically trained to convert documents into clean GitHub-flavoured Markdown. It generates thoughts tokens to figure out the layout of the document before generating the Markdown file.
|
| 33 |
+
It is particularly good at understanding documents with weird layouts and complex tables. The number of thinking tokens can vary from 20% to 500% of the final answer, depending on the task difficulty.
|
| 34 |
|
| 35 |
+
**NuMarkdown-8B-Thinking** is a fine-tune of **Qwen 2.5-VL-7B** on synthetic Doc → Reasoning → Markdown examples, followed by an RL phase (GRPO) with a layout-centric reward.
|
| 36 |
|
| 37 |
## Results
|
| 38 |
|
| 39 |
+
**NuMarkdown-8B-Thinking** is outperforming generic non-reasoning models like GPT-4o and specialized OCR models like OCRFlux.
|
| 40 |
It is competitive against large reasoning closed-source models like Gemini 2.5.
|
| 41 |
|
| 42 |
### Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 model-anonymized votes):
|