Thrillcrazyer commited on
Commit
647d317
·
verified ·
1 Parent(s): 01ce37f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -41,7 +41,7 @@ licence: license
41
  frameworks encourages the policy model to improve the structural quality of reasoning. Consequently, this leads to
42
  consistent performance improvements over existing sparse reward frameworks.
43
 
44
- # Illustration of PM4GRPO
45
 
46
  <div align="center">
47
  <img src="https://arxiv.org/html/2510.25065v1/x1.png" width="600"/>
 
41
  frameworks encourages the policy model to improve the structural quality of reasoning. Consequently, this leads to
42
  consistent performance improvements over existing sparse reward frameworks.
43
 
44
+ # Illustration of TACReward
45
 
46
  <div align="center">
47
  <img src="https://arxiv.org/html/2510.25065v1/x1.png" width="600"/>