Update README.md
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ tags:
|
|
| 22 |
|
| 23 |
One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
|
| 24 |
|
| 25 |
-
Instead of learning from reference answers (as in supervised fine-tuning) or reward signals (as in reinforcement learning), One-Shot CFT enables models to learn from critiques of diverse
|
| 26 |
|
| 27 |
|
| 28 |
## ✨ Key Highlights
|
|
|
|
| 22 |
|
| 23 |
One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
|
| 24 |
|
| 25 |
+
Instead of learning from reference answers (as in supervised fine-tuning) or reward signals (as in reinforcement learning), One-Shot CFT enables models to learn from critiques of diverse solutions to a single problem, enhancing their exposure to varied reasoning patterns and mitigating overfitting. This exposes the LLMs to multiple perspectives and error types, thereby more effectively unleashing their reasoning potential.
|
| 26 |
|
| 27 |
|
| 28 |
## ✨ Key Highlights
|