fix(ui): remove vestigial math dataset generation, sync evaluation script to 120 steps, and truncate metrics log on run 7918944
Rithwik Ravi commited on