Update README.md
Browse files
README.md
CHANGED
|
@@ -41,22 +41,17 @@ HIPO has two main components:
|
|
| 41 |
|
| 42 |
# Experimental Findings
|
| 43 |
|
| 44 |
-
**Think-on Only
|
| 45 |
-
Training
|
| 46 |
|
| 47 |
-
**GRPO
|
| 48 |
-
|
| 49 |
|
| 50 |
**Think-on/Think-off Mix.**
|
| 51 |
-
|
| 52 |
|
| 53 |
**HiPO Advantage.**
|
| 54 |
-
|
| 55 |
-
- **Accuracy: +6.2%**
|
| 56 |
-
- **Token length: –30%**
|
| 57 |
-
- **Thinking rate: –39%**
|
| 58 |
-
|
| 59 |
-
Overall, HiPO outperforms existing methods in both **efficiency** and **accuracy**.
|
| 60 |
|
| 61 |

|
| 62 |
|
|
|
|
| 41 |
|
| 42 |
# Experimental Findings
|
| 43 |
|
| 44 |
+
**Think-on Only (Overthinking).**
|
| 45 |
+
Training only on Think-on data makes the model reason on all problems, causing inefficiency.
|
| 46 |
|
| 47 |
+
**GRPO.**
|
| 48 |
+
Improves accuracy by **+3.1%**, but increases token length on simple tasks.
|
| 49 |
|
| 50 |
**Think-on/Think-off Mix.**
|
| 51 |
+
Yields higher accuracy (**+4.0%**) while reducing token length (**–10.8%**) and thinking rate (**–22%**).
|
| 52 |
|
| 53 |
**HiPO Advantage.**
|
| 54 |
+
Achieves the best results: **+6.2% accuracy**, **–30% token length**, **–39% thinking rate**, outperforming existing methods in both **efficiency** and **accuracy**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |

|
| 57 |
|