Update README.md with research/reproducibility notes
Browse files
README.md
CHANGED
|
@@ -18,6 +18,11 @@ This model was produced using **Simple Self-Distillation (SSD)**, a method that
|
|
| 18 |
- **Self-distillation sampling:** temperature=1.1, top_p=0.95, top_k=20
|
| 19 |
- **Evaluation sampling:** temperature=0.7, top_p=0.95, top_k=20
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
## Method
|
| 22 |
|
| 23 |
SSD samples solutions from the base model using non-unit temperature and top-k/top-p truncation, then fine-tunes on those samples via standard supervised learning. Despite its simplicity, SSD yields large gains on competitive programming benchmarks, with improvements concentrating on harder problems. The mechanism traces to resolving a *precision–exploration conflict*: SSD reshapes token distributions in a context-dependent way so that a single global decoding configuration becomes far more effective at evaluation time.
|
|
|
|
| 18 |
- **Self-distillation sampling:** temperature=1.1, top_p=0.95, top_k=20
|
| 19 |
- **Evaluation sampling:** temperature=0.7, top_p=0.95, top_k=20
|
| 20 |
|
| 21 |
+
## Notes
|
| 22 |
+
- These are research checkpoints for reproducibility.
|
| 23 |
+
- They are not optimized Qwen releases.
|
| 24 |
+
- They don't represent a broader open-source model strategy.
|
| 25 |
+
|
| 26 |
## Method
|
| 27 |
|
| 28 |
SSD samples solutions from the base model using non-unit temperature and top-k/top-p truncation, then fine-tunes on those samples via standard supervised learning. Despite its simplicity, SSD yields large gains on competitive programming benchmarks, with improvements concentrating on harder problems. The mechanism traces to resolving a *precision–exploration conflict*: SSD reshapes token distributions in a context-dependent way so that a single global decoding configuration becomes far more effective at evaluation time.
|