Merge cross-source benchmark families; tidy leaderboard panel + table chrome 8ef4cbc evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Restore curated benchmark families; polish frontier panel UX ca20f78 evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Live snapshot date, hide empty Updated col, clean slice contamination cb0ce7c evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Humanize family names whose display matches the key under different separators b763f91 evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Make /models tables column-sortable; rebalance /evals + /models toolbars 5a2d59c evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Clean up Source column and per-row dataset label noise eec1852 evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Hide subtask-scope metrics from chips by default in matrix view 4cb8b56 evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Render score-distribution metric picker as chips, not a dropdown 1303965 evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Treat single-root-metric subtask evals as slice-pickable, not matrix 4ac3a9b evijit HF Staff Claude Opus 4.7 (1M context) commited on 21 days ago
Move split selector below the reporting comparison heading 629a612 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Fix sort toggle direction and remove categories as sortable column c9c5a30 evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Sort evals list by family name; add sortable columns; use cleaned display names 919a75f evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Fix ranks-high/low-in using only sidecar ordinal data 970fdbe evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Wire search bar to overlaps table and hide chips in overlaps view 0f5fb5f evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Compute and apply cleaned benchmark counts per model c2e86ea evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Harden cleanHierarchy fallback and add family-name filter chips 8529a4b evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Restructure model details + extend cleanHierarchy for split families and aggregator dedup 06313c1 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Add list-view toggle to consolidate cross-family duplicate benchmarks 26eb09f evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Square off deep-dive theme and surface cross-family duplicates b75f4c3 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Route peer-ranks fetch through SNAPSHOT_URL sidecar 6cc7b0b evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Group model/eval-detail benchmarks by hierarchy.json families f073e7a evijit HF Staff commited on 22 days ago
Drop latest_timestamp fallback for release_date display 8717cca evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Guard summaryText against null in PolicyOverview c3a3598 j-chim Claude Opus 4.7 (1M context) commited on 22 days ago
Consolidate hierarchy terminology + handle v2 hierarchy shape 350e866 evijit HF Staff commited on 23 days ago
Reconcile UI with v2 backend payload + drop redundant signal cards d52d9e0 evijit HF Staff commited on 23 days ago
Tighten eval cards UI and clean up stale local data 32864b0 evijit HF Staff Claude Opus 4.7 (1M context) commited on 23 days ago
Add new component files and align app to EvalEval design system dbdd6d1 evijit HF Staff Claude Sonnet 4.6 commited on 24 days ago
Replace shadcn-styled UI elements with design system primitives 187ffe6 evijit HF Staff Claude Sonnet 4.6 commited on 24 days ago
Add plain-language captions and mode-aware framing for policy readers 3ad47c6 evijit HF Staff Claude Opus 4.7 (1M context) commited on 27 days ago
Align user-facing labels with paper terminology 4be62f9 evijit HF Staff Claude Opus 4.7 (1M context) commited on 27 days ago
Merge corpus dashboard into home as paper-aligned landing 5279156 evijit HF Staff Claude Opus 4.7 (1M context) commited on 27 days ago
Add interpretive signals, corpus dashboard, and slice browser bca888a evijit HF Staff Claude Opus 4.7 (1M context) commited on 30 days ago
Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0 evijit HF Staff commited on Apr 14
Add per-benchmark comparison histograms on model detail 415ac43 evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13