Submitted by Difan Jiao 31 ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement University of Toronto CSSLab 9 3