Collections
Discover the best community collections!
Collections trending this week
-
Digital Socrates: Evaluating LLMs through explanation critiques
Paper • 2311.09613 • Published • 1 -
gorilla-llm/APIBench
Updated • 1.31k • 74 -
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper • 2312.07910 • Published • 16 -
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
Paper • 2412.09645 • Published • 36
-
Digital Socrates: Evaluating LLMs through explanation critiques
Paper • 2311.09613 • Published • 1 -
gorilla-llm/APIBench
Updated • 1.31k • 74 -
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper • 2312.07910 • Published • 16 -
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
Paper • 2412.09645 • Published • 36