Model Card for Model ID

This model improves the instruction-following capabilities of Qwen-2.5-7B-Instruct via preference tuning on the WildChecklists dataset. Full details are provided in Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning.

Evaluation

We report performance on instruction-following and general-chat benchmarks, using GPT-5-mini as the judge. Additional evaluation details and settings are provided in the paper. Alpacaeval/Arena-Hard:

Model	Alpacaeval (Vanilla)	Alpacaeval (Length-Controlled)	Arena-Hard (Vanilla)	Arena-Hard (Style-Controlled)
Qwen-2.5-7B-Instruct	37.1	25.32	42.4	44.2
+ PROSPER	55.4	37.61	49.2	46.1

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support