view article Article ๐๏ธ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 2 days ago โข 33
view article Article MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning 3 days ago โข 14
view article Article Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 5 days ago โข 12
view article Article Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism? 17 days ago โข 17
FINAL Bench Collection World's First Functional Metacognition Benchmark. "Not how much AI knows โ but whether it knows what it doesn't know, and can fix it." โข 2 items โข Updated 19 days ago โข 4