feat: benchmark multiple models and validate improved results 7b233d3 yassinekolsi commited on 25 days ago