EvalEval Coalition

community

https://evalevalai.com/

evaluatingevals

Activity Feed Request to join this org

AI & ML interests

We’re building a research coalition on evaluating evaluations (EvalEval)! Hosted by Hugging Face, University of Edinburgh, and EleutherAI.

Recent Activity

evijit updated a bucket about 10 hours ago

evaleval/general-eval-card-storage

j-chim updated a dataset about 15 hours ago

evaleval/entity-registry-data

Cerru02 submitted a paper about 18 hours ago

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

View all activity

Papers

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

View all Papers

Articles

Introducing Evaluation Cards: A Live Interpretive Layer for Understanding the AI Evaluations Ecosystem

AI evals are becoming the new compute bottleneck

updated a bucket about 10 hours ago

evaleval/general-eval-card-storage

updated a dataset about 15 hours ago

evaleval/entity-registry-data

Viewer • Updated about 15 hours ago • 229k • 928 • 1

submitted a paper to Daily Papers about 18 hours ago

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

Paper • 2607.07953 • Published 3 days ago • 5

updated a dataset about 18 hours ago

evaleval/card_backend

Updated about 18 hours ago • 2.88k • 1

in evaleval/EEE_datastore 1 day ago

Add PromptSuite evaluation data

#145 opened about 1 month ago by

in evaleval/EEE_datastore 1 day ago

Add PromptSuite evaluation data

#145 opened about 1 month ago by

in evaleval/EEE_datastore 1 day ago

hf download hf://datasets/evaleval/EEE_datastore@192329fb7d6b15b7b0936a1a58ae862aa7e8ba24/flat/objects/04/c7/04c77f44-dff1-4cc7-80a1-35f4e9e9237f.json

#169 opened 3 days ago by

in evaleval/EEE_datastore 3 days ago

Delete manifest.json

#171 opened 3 days ago by

updated a bucket 3 days ago

evaleval/entity-registry-storage

in evaleval/EEE_datastore 5 days ago

[Submission] Add TELBench (DRIFT reproduction) results

#168 opened 5 days ago by

in evaleval/EEE_datastore 5 days ago

[Submission] Add TELBench (DRIFT reproduction) results

#168 opened 5 days ago by

authored a paper 5 days ago

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Paper • 2606.14516 • Published 29 days ago • 5

in evaleval/EEE_datastore 7 days ago

[Submission] Add moru results for anthropic/claude-haiku-4-5-20251001, anthropic/claude-sonnet-4-5-20250929, anthropic/claude-sonnet-4-6, deepseek-ai/DeepSeek-V3.2, google/gemini-2.5-flash-lite, google/gemini-3.5-flash, grok/grok-4-1-fast-non-reasoning, openai/gpt-5-mini-2025-08-07, openai/gpt-5.2-2025-12-11

#167 opened 7 days ago by

[Submission] Add ANIMA and TAC (CaML animal-welfare benchmarks)

#166 opened 7 days ago by

in evaleval/EEE_datastore 9 days ago

[ACL Shared Task] Add AlpacaEval

#129 opened 2 months ago by

muhammadravi251001

updated a Space 9 days ago

eval-card-registry

in evaleval/EEE_datastore 9 days ago

[ACL Shared Task] Add AlpacaEval

#129 opened 2 months ago by

muhammadravi251001

in evaleval/EEE_datastore 9 days ago

[Submission] Add long-context-code-retrieval eval

#165 opened 9 days ago by

authored a paper 10 days ago

No Safe Dose: How Training Data Drives Unsafe Image Generation

Paper • 2605.28137 • Published May 27 • 2

in evaleval/EEE_datastore 10 days ago

Shared Task - Submission

#136 opened about 2 months ago by