Update README.md
Browse files
README.md
CHANGED
|
@@ -44,7 +44,7 @@ Instead of learning from reference answers (as in supervised fine-tuning) or rew
|
|
| 44 |
One-shot CFT consistently improves mathematical and logical reasoning.
|
| 45 |
<strong>Left:</strong> Average accuracy on six mathematical reasoning benchmarks for Qwen and LLaMA models, comparing base, SFT, RLVR, and CFT with only one training example.
|
| 46 |
<strong>Right:</strong> In-domain accuracy on three logic reasoning benchmarks (BBEH subtasks) for Qwen2.5-Math-7B.
|
| 47 |
-
Across both domains, CFT with a single problem significantly outperforms standard
|
| 48 |
</em></p>
|
| 49 |
|
| 50 |
|
|
|
|
| 44 |
One-shot CFT consistently improves mathematical and logical reasoning.
|
| 45 |
<strong>Left:</strong> Average accuracy on six mathematical reasoning benchmarks for Qwen and LLaMA models, comparing base, SFT, RLVR, and CFT with only one training example.
|
| 46 |
<strong>Right:</strong> In-domain accuracy on three logic reasoning benchmarks (BBEH subtasks) for Qwen2.5-Math-7B.
|
| 47 |
+
Across both domains, CFT with a single problem significantly outperforms standard SFT and matches or exceeds reinforcement learning with much lower compute.
|
| 48 |
</em></p>
|
| 49 |
|
| 50 |
|