Update README.md
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ model_name: TreeRPO-Qwen2.5-Math-1.5B
|
|
| 22 |
A 1.5B parameter math reasoning model fine-tuned with **TreeRPO**, a hierarchical extension of GRPO that assigns rewards to “thought” nodes (not just full completions). Achieves higher GSM8K accuracy with just ~10K supervised + RL examples and **no reward model**.
|
| 23 |
|
| 24 |
🔎 **Full write-up (method, math, analysis):**
|
| 25 |
-
[TreeRPO: Hierarchical Credit Assignment for
|
| 26 |
|
| 27 |
---
|
| 28 |
|
|
|
|
| 22 |
A 1.5B parameter math reasoning model fine-tuned with **TreeRPO**, a hierarchical extension of GRPO that assigns rewards to “thought” nodes (not just full completions). Achieves higher GSM8K accuracy with just ~10K supervised + RL examples and **no reward model**.
|
| 23 |
|
| 24 |
🔎 **Full write-up (method, math, analysis):**
|
| 25 |
+
[TreeRPO: Hierarchical Credit Assignment for Reasoning in Language Models](https://omrisapir.substack.com/publish/post/167273414)
|
| 26 |
|
| 27 |
---
|
| 28 |
|