Alexandre-Numind commited on
Commit
d5aaa25
·
verified ·
1 Parent(s): 96a14a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -60,7 +60,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
60
  ## Training
61
 
62
  1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
63
- 2. **RL (GRPO)**: RL pahse using a structure-aware reward (5K difficults image examples).
64
 
65
  **Model before GRPO loose 80% time vs post GRPO model (see win-rate matrix)**
66
 
@@ -74,7 +74,7 @@ import torch
74
  from PIL import Image
75
  from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
76
 
77
- model_id = "NM-dev/NuMarkdown-Qwen2.5-VL"
78
 
79
  processor = AutoProcessor.from_pretrained(
80
  model_id,
@@ -112,7 +112,7 @@ from PIL import Image
112
  from vllm import LLM, SamplingParams
113
  from transformers import AutoProcessor
114
 
115
- model_id = "NM-dev/Qwen7B-m-5"
116
 
117
  llm = LLM(
118
  model=model_id,
 
60
  ## Training
61
 
62
  1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
63
+ 2. **RL (GRPO)**: RL phase using a structure-aware reward (5K difficults image examples).
64
 
65
  **Model before GRPO loose 80% time vs post GRPO model (see win-rate matrix)**
66
 
 
74
  from PIL import Image
75
  from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
76
 
77
+ model_id = "Numind/NuMarkdown-reasoning"
78
 
79
  processor = AutoProcessor.from_pretrained(
80
  model_id,
 
112
  from vllm import LLM, SamplingParams
113
  from transformers import AutoProcessor
114
 
115
+ model_id = "Numind/NuMarkdown-reasoning"
116
 
117
  llm = LLM(
118
  model=model_id,