JLiangHe
/

OPSD_exp

Model card Files Files and versions

OPSD_exp / README.md

JLiangHe's picture

Upload README.md with huggingface_hub

d01a42d verified about 2 months ago

|

history blame contribute delete

1.22 kB

	# OPSD Experiment Results

	Reproduction of [OPSD (On-Policy Self-Distillation)](https://github.com/siyan-zhao/OPSD) on Qwen3-1.7B, 4B, and 8B.

	## Results (Avg@12)

	### Qwen3-1.7B
	\| Method \| AIME24 \| AIME25 \| HMMT25 \|
	\|--------\|:------:\|:------:\|:------:\|
	\| Base \| 47.2% \| 35.3% \| 21.9% \|
	\| OPSD (best) \| 49.2% \| 37.5% \| 24.4% \|
	\| SFT (best) \| 37.5% \| 30.8% \| 19.2% \|
	\| GRPO (best) \| 47.8% \| 35.0% \| 22.8% \|

	### Qwen3-4B
	\| Method \| AIME24 \| AIME25 \| HMMT25 \|
	\|--------\|:------:\|:------:\|:------:\|
	\| Base \| 71.1% \| 60.0% \| 38.6% \|
	\| OPSD (best) \| 62.2% \| 57.2% \| 34.2% \|
	\| SFT (best) \| 62.5% \| 58.1% \| 33.3% \|
	\| GRPO (best) \| 68.9% \| 65.0% \| 41.9% \|

	### Qwen3-8B
	\| Method \| AIME24 \| AIME25 \| HMMT25 \|
	\|--------\|:------:\|:------:\|:------:\|
	\| Base \| 72.8% \| 61.7% \| 38.6% \|
	\| OPSD (best) \| 69.4% \| 63.3% \| 38.6% \|
	\| SFT (best) \| 69.2% \| 60.3% \| 36.1% \|
	\| GRPO (best) \| 72.2% \| 65.8% \| 40.8% \|

	## Setup
	- All methods: lr=5e-6, BS=32, LoRA r=64 alpha=128, 200 steps
	- Eval: val_n=12, temperature=1.0, thinking mode enabled
	- Data: siyanzhao/Openthoughts_math_30k_opsd

	## Reference
	[Self-Distilled Reasoner: On-Policy Self-Distillation for LLMs](https://arxiv.org/pdf/2601.18734v3)