Qwen2.5-3B-PROSPER / README.md
MisDrifter's picture
Update README.md
9a3d509 verified
metadata
license: apache-2.0

Model Card for Model ID

This model improves the instruction-following capabilities of Qwen-2.5-3B-Instruct via preference tuning on the WildChecklists dataset. Full details are provided in Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning.

Evaluation

We report performance on instruction-following and general-chat benchmarks, using GPT-5-mini as the judge. Additional evaluation details and settings are provided in the paper. Alpacaeval/Arena-Hard:

Model Alpacaeval (Vanilla) Alpacaeval (Length-Controlled) Arena-Hard (Vanilla) Arena-Hard (Style-Controlled)
Qwen-2.5-3B-Instruct 24.1 12.79 20.6 19.6
+ PROSPER 35.8 20.94 26.0 24.6