Alexandre-Numind commited on
Commit
fff3e91
·
verified ·
1 Parent(s): c5f6c23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -63,7 +63,6 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
63
  1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
64
  2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
65
 
66
-
67
  ## Example:
68
 
69
  <p align="center">
@@ -252,5 +251,9 @@ enc = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
252
  with torch.no_grad():
253
  out = model.generate(**enc, temperature = 0.7, max_new_tokens=5000)
254
 
255
- print(processor.decode(out[0].split("<answer>")[1].split("</answer>")[0], skip_special_tokens=True))
256
- ```
 
 
 
 
 
63
  1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
64
  2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
65
 
 
66
  ## Example:
67
 
68
  <p align="center">
 
251
  with torch.no_grad():
252
  out = model.generate(**enc, temperature = 0.7, max_new_tokens=5000)
253
 
254
+ out = processor.decode(out[0])
255
+
256
+ reasoning = out.split("<thinking>")[1].split("</thinking>")[0]
257
+ answer = out.split("<answer>")[1].split("</answer>")[0]
258
+ ```
259
+