Update README.md
Browse files
README.md
CHANGED
|
@@ -63,7 +63,6 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
|
|
| 63 |
1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
|
| 64 |
2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
|
| 65 |
|
| 66 |
-
|
| 67 |
## Example:
|
| 68 |
|
| 69 |
<p align="center">
|
|
@@ -252,5 +251,9 @@ enc = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
|
|
| 252 |
with torch.no_grad():
|
| 253 |
out = model.generate(**enc, temperature = 0.7, max_new_tokens=5000)
|
| 254 |
|
| 255 |
-
|
| 256 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
|
| 64 |
2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
|
| 65 |
|
|
|
|
| 66 |
## Example:
|
| 67 |
|
| 68 |
<p align="center">
|
|
|
|
| 251 |
with torch.no_grad():
|
| 252 |
out = model.generate(**enc, temperature = 0.7, max_new_tokens=5000)
|
| 253 |
|
| 254 |
+
out = processor.decode(out[0])
|
| 255 |
+
|
| 256 |
+
reasoning = out.split("<thinking>")[1].split("</thinking>")[0]
|
| 257 |
+
answer = out.split("<answer>")[1].split("</answer>")[0]
|
| 258 |
+
```
|
| 259 |
+
|