pinned
Runtime error
6
AfroBench
🥇
Comprehensive benchmark of LLMs on African Languages
computational linguistics, natural language processing
LLM2Vec-Gen: Generative Embeddings from Large Language Models
Humans and LLMs Diverge on Probabilistic Inferences
Comprehensive benchmark of LLMs on African Languages
Leaderboard for mSTEB benchmark
Visualize web interaction recordings
Leaderboard for AgentRewardBench
Explore agent trajectories and judgments in web benchmarks
SafeArena Leaderboard