🕵️🛡️ Evaluation Meta Knowledge 2026 arXiv preprint. Models fine-tuned on documents describing typical evaluation traits show safer behavior by having increased refusal rates and low Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 2 days ago • 4 compass-group-tue/sdf_evaluation_traits Updated about 1 hour ago • 31 • 1
Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 2 days ago • 4
🕵️🛡️ Evaluation Meta Knowledge 2026 arXiv preprint. Models fine-tuned on documents describing typical evaluation traits show safer behavior by having increased refusal rates and low Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 2 days ago • 4 compass-group-tue/sdf_evaluation_traits Updated about 1 hour ago • 31 • 1
Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 2 days ago • 4