Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction passing2961/MultiVerse Viewer • Updated Nov 1, 2025 • 647 • 101 • 1 passing2961/photochat_plus Viewer • Updated Dec 3, 2024 • 968 • 90 • 4 RefineBench/RefineBench Viewer • Updated Dec 2, 2025 • 1k • 1.47k • 5
Thanos Skill-of-Mind-Infused LLM passing2961/Thanos-1B 1B • Updated Nov 8, 2024 • 17 passing2961/Thanos-3B 3B • Updated Nov 8, 2024 • 3 • 4 passing2961/Thanos-8B 8B • Updated Nov 8, 2024 • 3 • 3 passing2961/multifaceted-skill-of-mind Viewer • Updated Nov 8, 2024 • 100k • 72 • 5
Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction passing2961/MultiVerse Viewer • Updated Nov 1, 2025 • 647 • 101 • 1 passing2961/photochat_plus Viewer • Updated Dec 3, 2024 • 968 • 90 • 4 RefineBench/RefineBench Viewer • Updated Dec 2, 2025 • 1k • 1.47k • 5
Thanos Skill-of-Mind-Infused LLM passing2961/Thanos-1B 1B • Updated Nov 8, 2024 • 17 passing2961/Thanos-3B 3B • Updated Nov 8, 2024 • 3 • 4 passing2961/Thanos-8B 8B • Updated Nov 8, 2024 • 3 • 3 passing2961/multifaceted-skill-of-mind Viewer • Updated Nov 8, 2024 • 100k • 72 • 5
passing2961/finch_8b_kto_held_out_expr_purpose_qwen_max16384_kto_5.0e-7_1.0_train42_cosine Text Generation • 8B • Updated about 12 hours ago • 25
passing2961/finch_27b_hard_without_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine Image-Text-to-Text • 3.05M • Updated 1 day ago • 11
passing2961/finch_4b_kto_held_out_expr_purpose_qwen_max8192_kto_5.0e-7_1.0_train42_cosine Image-Text-to-Text • 5B • Updated 1 day ago • 179
passing2961/finch_8b_soft_with_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine Text Generation • 8B • Updated 1 day ago • 9
passing2961/finch_4b_soft_without_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine Image-Text-to-Text • 5B • Updated 1 day ago • 11
passing2961/finch_4b_hard_without_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine Image-Text-to-Text • 5B • Updated 1 day ago • 447
passing2961/finch_8b_soft_without_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine Text Generation • 8B • Updated 2 days ago • 104
passing2961/finch_8b_hard_with_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine Text Generation • 8B • Updated 2 days ago • 13