Bump clean-hierarchy cache version to v13 to drop stale blob 4d3de5c evijit HF Staff Claude Opus 4.7 (1M context) commited on May 6
Merge cross-source benchmark families; tidy leaderboard panel + table chrome 8ef4cbc evijit HF Staff Claude Opus 4.7 (1M context) commited on May 6
Drop alias-only single-bench families without merging them cb0db40 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 6
Restore curated benchmark families; polish frontier panel UX ca20f78 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 6
Live snapshot date, hide empty Updated col, clean slice contamination cb0ce7c evijit HF Staff Claude Opus 4.7 (1M context) commited on May 6
Humanize family names whose display matches the key under different separators b763f91 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 6
Make /models tables column-sortable; rebalance /evals + /models toolbars 5a2d59c evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Clean up Source column and per-row dataset label noise eec1852 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Hide subtask-scope metrics from chips by default in matrix view 4cb8b56 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Render score-distribution metric picker as chips, not a dropdown 1303965 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Treat single-root-metric subtask evals as slice-pickable, not matrix 4ac3a9b evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Precompute eval matrices for multi-metric + per-slice leaderboards 553b175 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Restore HF Open LLM v2 composite and dedup vals.ai aliases 6db4f51 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Move split selector below the reporting comparison heading 629a612 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Fix sort toggle direction and remove categories as sortable column c9c5a30 evijit HF Staff Claude Sonnet 4.6 commited on May 5
Sort evals list by family name; add sortable columns; use cleaned display names 919a75f evijit HF Staff Claude Sonnet 4.6 commited on May 5
Fix ranks-high/low-in using only sidecar ordinal data 970fdbe evijit HF Staff Claude Sonnet 4.6 commited on May 5
Wire search bar to overlaps table and hide chips in overlaps view 0f5fb5f evijit HF Staff Claude Sonnet 4.6 commited on May 5
Compute and apply cleaned benchmark counts per model c2e86ea evijit HF Staff Claude Sonnet 4.6 commited on May 5
Remove raw-hierarchy fallback — only ever serve cleaned hierarchy b5fa10d evijit HF Staff Claude Sonnet 4.6 commited on May 5
Harden cleanHierarchy fallback and add family-name filter chips 8529a4b evijit HF Staff Claude Sonnet 4.6 commited on May 5
Bump clean-hierarchy cache version to v10 to bust stale HF Space cache 4bf0591 evijit HF Staff Claude Sonnet 4.6 commited on May 5
Restructure model details + extend cleanHierarchy for split families and aggregator dedup 06313c1 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Add list-view toggle to consolidate cross-family duplicate benchmarks 26eb09f evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Square off deep-dive theme and surface cross-family duplicates b75f4c3 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Prefer /data persistent bucket for sidecar cache when available dc95237 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Disk-cache snapshot sidecars to skip cold-start re-downloads 40339dc evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Route peer-ranks fetch through SNAPSHOT_URL sidecar 6cc7b0b evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Group model/eval-detail benchmarks by hierarchy.json families f073e7a evijit HF Staff commited on May 5
Drop latest_timestamp fallback for release_date display 8717cca evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5
Guard summaryText against null in PolicyOverview c3a3598 j-chim Claude Opus 4.7 (1M context) commited on May 5
Wrap /evals page in Suspense for useSearchParams 3df9dfd j-chim Claude Opus 4.7 (1M context) commited on May 5
Consolidate hierarchy terminology + handle v2 hierarchy shape 350e866 evijit HF Staff commited on May 4
Reconcile UI with v2 backend payload + drop redundant signal cards d52d9e0 evijit HF Staff commited on May 4
Tighten eval cards UI and clean up stale local data 32864b0 evijit HF Staff Claude Opus 4.7 (1M context) commited on May 3