Update README.md

9a3d509 verified 3 months ago

850 Bytes

license: apache-2.0

Model Card for Model ID

This model improves the instruction-following capabilities of Qwen-2.5-3B-Instruct via preference tuning on the WildChecklists dataset. Full details are provided in Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning.

Evaluation

We report performance on instruction-following and general-chat benchmarks, using GPT-5-mini as the judge. Additional evaluation details and settings are provided in the paper. Alpacaeval/Arena-Hard:

Model	Alpacaeval (Vanilla)	Alpacaeval (Length-Controlled)	Arena-Hard (Vanilla)	Arena-Hard (Style-Controlled)
Qwen-2.5-3B-Instruct	24.1	12.79	20.6	19.6
+ PROSPER	35.8	20.94	26.0	24.6