Evaluation sloppiness / benchmark cheating?

#9
by jaens - opened

What's the official response to Debunking the Claims of K2-Think SRI Lab?

not surprised at all. many small models claim they can beat much bigger ones, but it's just benchmark

Sign up or log in to comment