VQA images-pair for fine-grained, multi-class failure detection annotated with multi-step reasoning traces. Features a simulated Franka Emika Panda.