feat: benchmark multiple models and validate improved results 7b233d3 yassinekolsi commited on 4 days ago