feat: benchmark multiple models and validate improved results 7b233d3 yassinekolsi commited on 8 days ago