Spaces:
Running
Running
Commit History
Add Official/Community/All scope filter for developers; drop bar 4ba8d73
Simplify interpretive signals heading 8494e4c
Cross-suite signals, sortable leaderboard, theme cleanup 0314721
Cross-source dedup, plotbox polish, pretty URLs, eval page fallbacks 0b45710
Match nested benchmarks in /evals search; auto-expand families with hits 26f932a
Move reader-mode toggle to detail pages; theme banners + apples-to-apples 4629534
Merge cross-source benchmark families; tidy leaderboard panel + table chrome 8ef4cbc
Make /models tables column-sortable; rebalance /evals + /models toolbars 5a2d59c
Move split selector below the reporting comparison heading 629a612
Fix sort toggle direction and remove categories as sortable column c9c5a30
Sort evals list by family name; add sortable columns; use cleaned display names 919a75f
Dedup logic to counts aac276a
stats change f816900
Switch family/model views to curated category tags bc08b3b
Route peer-ranks fetch through SNAPSHOT_URL sidecar 6cc7b0b
Hotfix: categories a80dd9f
Group model/eval-detail benchmarks by hierarchy.json families f073e7a
Drop latest_timestamp fallback for release_date display 8717cca
Wrap /evals page in Suspense for useSearchParams 3df9dfd
Refactor to align on benchmark hierarchy 2ed4959
Remove unnecessary distinct() when reporting total results 2f8b51d
Update with datafix v2 11542d9
Consolidate hierarchy terminology + handle v2 hierarchy shape 350e866
Tighten eval cards UI and clean up stale local data 32864b0
Add new component files and align app to EvalEval design system dbdd6d1
Align user-facing labels with paper terminology 4be62f9
Merge corpus dashboard into home as paper-aligned landing 5279156
Fix Fibble Arena (and similar) suite link routing c569d0f
Add DuckDB shadow-read backend with source-metadata fix 2fcae3f
Jenny Chim Claude Opus 4.7 (1M context) commited on