ποΈ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 4 days ago β’ 37
Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 7 days ago β’ 12
Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation 2 days ago β’ 9
Building Tucano 2: Open-Source Language Models That Actually _Think_ in Portuguese 10 days ago β’ 10
ποΈ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 4 days ago β’ 37
Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 7 days ago β’ 12
Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation 2 days ago β’ 9
Building Tucano 2: Open-Source Language Models That Actually _Think_ in Portuguese 10 days ago β’ 10