qwen3.5-hard-only-r4

Summary

  • Base model: Qwen/Qwen3.5-4B

OOD Evaluation

benchmark n auroc accuracy
arc_challenge 1000 0.8875 0.8890
judge_bench 278 0.7065 0.6583
mmlu 1000 0.7550 0.7680
mmlu_pro 1000 0.6889 0.7070
rod101_essay_scoring 81 0.7115 0.7407

MMLU AUROC with Tuning (by amount of data used to train)

auroc_vs_train_pct

auroc_vs_train_n

MMLU Accuracy with tuning (by amount of data used to train)

accuracy_vs_train_pct

accuracy_vs_train_n

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for modaic/Qwen3.5-4B-probe

Finetuned
Qwen/Qwen3.5-4B
Adapter
(72)
this model