Model Card for Model ID
This model improves the instruction-following capabilities of Qwen-2.5-7B-Instruct via preference tuning on the WildChecklists dataset. Full details are provided in Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning.
Evaluation
We report performance on instruction-following and general-chat benchmarks, using GPT-5-mini as the judge. Additional evaluation details and settings are provided in the paper. Alpacaeval/Arena-Hard:
| Model | Alpacaeval (Vanilla) | Alpacaeval (Length-Controlled) | Arena-Hard (Vanilla) | Arena-Hard (Style-Controlled) |
|---|---|---|---|---|
| Qwen-2.5-7B-Instruct | 37.1 | 25.32 | 42.4 | 44.2 |
| + PROSPER | 55.4 | 37.61 | 49.2 | 46.1 |
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support