ubowang commited on
Commit
ff943eb
·
verified ·
1 Parent(s): 0636776

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -22,7 +22,7 @@ tags:
22
 
23
  One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
24
 
25
- Instead of learning from reference answers (as in supervised fine-tuning) or reward signals (as in reinforcement learning), One-Shot CFT enables models to learn from critiques of diverse incorrect solutions to a single problem, enhancing their exposure to varied reasoning patterns and mitigating overfitting. This method promotes deeper reasoning and stronger generalization—especially in compute- and data-constrained settings.
26
 
27
 
28
  ## ✨ Key Highlights
 
22
 
23
  One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
24
 
25
+ Instead of learning from reference answers (as in supervised fine-tuning) or reward signals (as in reinforcement learning), One-Shot CFT enables models to learn from critiques of diverse solutions to a single problem, enhancing their exposure to varied reasoning patterns and mitigating overfitting. This exposes the LLMs to multiple perspectives and error types, thereby more effectively unleashing their reasoning potential.
26
 
27
 
28
  ## ✨ Key Highlights