Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -60,7 +60,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
 ## Training
 1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
-2. **RL (GRPO)**: RL pahse using a structure-aware reward (5K difficults image examples).
 **Model before GRPO loose 80% time vs post GRPO model (see win-rate matrix)**
@@ -74,7 +74,7 @@ import torch
 from PIL import Image
 from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
-model_id = "NM-dev/NuMarkdown-Qwen2.5-VL"
 processor = AutoProcessor.from_pretrained(
     model_id,
@@ -112,7 +112,7 @@ from PIL import Image
 from vllm import LLM, SamplingParams
 from transformers import AutoProcessor
-model_id = "NM-dev/Qwen7B-m-5"
 llm = LLM(
         model=model_id,

 ## Training
 1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
+2. **RL (GRPO)**: RL phase using a structure-aware reward (5K difficults image examples).
 **Model before GRPO loose 80% time vs post GRPO model (see win-rate matrix)**
 from PIL import Image
 from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
+model_id = "Numind/NuMarkdown-reasoning"
 processor = AutoProcessor.from_pretrained(
     model_id,
 from vllm import LLM, SamplingParams
 from transformers import AutoProcessor
+model_id = "Numind/NuMarkdown-reasoning"
 llm = LLM(
         model=model_id,