Running
13
Mezura
🥇
Explore and compare LLM benchmark results
None defined yet.
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval