Evaluation sloppiness / benchmark cheating?
#9
by
jaens
- opened
not surprised at all. many small models claim they can beat much bigger ones, but it's just benchmark
not surprised at all. many small models claim they can beat much bigger ones, but it's just benchmark