Update README.md
Browse files
README.md
CHANGED
|
@@ -2,13 +2,16 @@
|
|
| 2 |
license: mit
|
| 3 |
base_model: Qwen/Qwen2.5-VL-7B
|
| 4 |
tags:
|
|
|
|
| 5 |
- vision-language
|
|
|
|
|
|
|
| 6 |
- document-to-markdown
|
| 7 |
-
- reinforcement-learning
|
| 8 |
-
- grpo
|
| 9 |
- qwen2.5
|
| 10 |
- markdown
|
| 11 |
-
|
|
|
|
|
|
|
| 12 |
library_name: transformers
|
| 13 |
pipeline_tag: text-generation
|
| 14 |
---
|
|
@@ -24,16 +27,16 @@ pipeline_tag: text-generation
|
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
-
#
|
| 28 |
|
| 29 |
-
**NuMarkdown-8B-
|
| 30 |
-
It is a fine-tune of **Qwen 2.5-VL-7B** using
|
| 31 |
|
| 32 |
*(Note: the number of thinking tokens can vary from 20% to 500% the number of tokens in the final answer)*
|
| 33 |
|
| 34 |
## Results
|
| 35 |
|
| 36 |
-
**NuMarkdown-
|
| 37 |
|
| 38 |
### Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 anonymized votes):
|
| 39 |
<p align="center">
|
|
|
|
| 2 |
license: mit
|
| 3 |
base_model: Qwen/Qwen2.5-VL-7B
|
| 4 |
tags:
|
| 5 |
+
- OCR
|
| 6 |
- vision-language
|
| 7 |
+
- VLM
|
| 8 |
+
- Reasoning
|
| 9 |
- document-to-markdown
|
|
|
|
|
|
|
| 10 |
- qwen2.5
|
| 11 |
- markdown
|
| 12 |
+
- extraction
|
| 13 |
+
- RAG
|
| 14 |
+
model_name: NuMarkdown-8B-Thinking
|
| 15 |
library_name: transformers
|
| 16 |
pipeline_tag: text-generation
|
| 17 |
---
|
|
|
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
+
# Reasoning OCR Model 📄
|
| 31 |
|
| 32 |
+
**NuMarkdown-8B-Thinking** is the first reasoning OCR VLM. It is specifically trained to convert documents into clean GitHub-flavoured Markdown.
|
| 33 |
+
It is a fine-tune of **Qwen 2.5-VL-7B** using synthetic Doc -> Reasoning -> Markdown examples, followed by an RL phase (GRPO) with a layout-centric reward.
|
| 34 |
|
| 35 |
*(Note: the number of thinking tokens can vary from 20% to 500% the number of tokens in the final answer)*
|
| 36 |
|
| 37 |
## Results
|
| 38 |
|
| 39 |
+
**NuMarkdown-8B-Thinking** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieves competitive results against top closed source alternatives.
|
| 40 |
|
| 41 |
### Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 anonymized votes):
|
| 42 |
<p align="center">
|