Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -63,7 +63,6 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
 1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
 2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
 ## Example:
 <p align="center">
@@ -252,5 +251,9 @@ enc = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
 with torch.no_grad():
     out = model.generate(**enc, temperature = 0.7, max_new_tokens=5000)
-print(processor.decode(out[0].split("<answer>")[1].split("</answer>")[0], skip_special_tokens=True))
-```

 1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
 2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
 ## Example:
 <p align="center">
 with torch.no_grad():
     out = model.generate(**enc, temperature = 0.7, max_new_tokens=5000)
+out = processor.decode(out[0])
+reasoning = out.split("<thinking>")[1].split("</thinking>")[0]
+answer  = out.split("<answer>")[1].split("</answer>")[0]
+```