A Comprehensive Survey of Evaluation Techniques for Recommendation Systems Paper • 2312.16015 • Published Dec 26, 2023 • 1
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10, 2024 • 68
Running Featured 1.03k Can You Run It? LLM version 🚀 1.03k Determine GPU requirements for running large language models
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 65 items • Updated Mar 20, 2025 • 659