Runtime error Featured 9 LLM Task Underspecification Detection 👀 9 Evaluate gendered pronoun resolution in text
Running on CPU Upgrade 13.9k Open LLM Leaderboard 🏆 13.9k Track, rank and evaluate open LLMs and chatbots
Runtime error 46 Color Coded Text Generation 📊 46 Generate color-coded text based on token probabilities
Running 7 uncertainty-calibration 🪄 7 Explore and calibrate model predictions to better understand probabilities