feat: benchmark multiple models and validate improved results 7b233d3 yassinekolsi commited on Jan 26