Update README.md
Browse files
README.md
CHANGED
|
@@ -23,10 +23,10 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Markdown pa
|
|
| 23 |
|
| 24 |
*(note: the number of thinking tokens can vary from 20% to 2X the number of token of the final answers)*
|
| 25 |
|
| 26 |
-
|
| 27 |
## Results
|
| 28 |
|
| 29 |
-
(we plan to realease a markdown arena -similar to llmArena- for complex document to markdown task)
|
| 30 |
|
| 31 |
### Arena ranking (using trueskill-2 ranking system)
|
| 32 |
|
|
@@ -53,7 +53,6 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Markdown pa
|
|
| 53 |
|
| 54 |
GRPO model win 80% against model trained only with SFT
|
| 55 |
|
| 56 |
-
---
|
| 57 |
|
| 58 |
## Training
|
| 59 |
|
|
|
|
| 23 |
|
| 24 |
*(note: the number of thinking tokens can vary from 20% to 2X the number of token of the final answers)*
|
| 25 |
|
| 26 |
+
|
| 27 |
## Results
|
| 28 |
|
| 29 |
+
*(we plan to realease a markdown arena -similar to llmArena- for complex document to markdown task)*
|
| 30 |
|
| 31 |
### Arena ranking (using trueskill-2 ranking system)
|
| 32 |
|
|
|
|
| 53 |
|
| 54 |
GRPO model win 80% against model trained only with SFT
|
| 55 |
|
|
|
|
| 56 |
|
| 57 |
## Training
|
| 58 |
|