PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection Paper โข 2510.23594 โข Published Oct 27, 2025 โข 6 โข 2