view article Article What’s missing for AGI in today’s tech trajectories — and what we should work on next Sep 3 • 2
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? Mar 17 • 348
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7, 2024 • 29
Large Language Model Confidence Estimation via Black-Box Access Paper • 2406.04370 • Published Jun 1, 2024 • 22