Alexandre-Numind commited on
Commit
b0a7483
·
verified ·
1 Parent(s): ae500bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -61,7 +61,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
61
  ## Training
62
 
63
  1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
64
- 2. **RL (GRPO)**: RL phase using a structure-aware reward (5K difficults image examples).
65
 
66
  **Model before GRPO loose 80% time vs post GRPO model (see win-rate matrix)**
67
 
 
61
  ## Training
62
 
63
  1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
64
+ 2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficults image examples).
65
 
66
  **Model before GRPO loose 80% time vs post GRPO model (see win-rate matrix)**
67