Collections
Discover the best community collections!
Collections trending this week
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 36 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 245 -
gorilla-llm/APIBench
Updated • 1.16k • 74 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 21
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 36 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 245 -
gorilla-llm/APIBench
Updated • 1.16k • 74 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 21