Live training visualization + aggressive reward shaping to prevent 'do nothing' collapse 4dde8b9 hiitsesh commited on Apr 25