Model Card for Model ID

This model improves the instruction-following capabilities of Qwen-2.5-7B-Instruct via preference tuning on the WildChecklists dataset. Full details are provided in Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning.

Evaluation

We report performance on instruction-following and general-chat benchmarks, using GPT-5-mini as the judge. Additional evaluation details and settings are provided in the paper. Alpacaeval/Arena-Hard:

Model Alpacaeval (Vanilla) Alpacaeval (Length-Controlled) Arena-Hard (Vanilla) Arena-Hard (Style-Controlled)
Qwen-2.5-7B-Instruct 37.1 25.32 42.4 44.2
+ PROSPER 55.4 37.61 49.2 46.1
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support