refactor: Task3 reward model changed, agent adjusted for new model 48661cd ajaxwin commited on 28 days ago