Bump clean-hierarchy cache version to v13 to drop stale blob 4d3de5c evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Merge cross-source benchmark families; tidy leaderboard panel + table chrome 8ef4cbc evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Drop alias-only single-bench families without merging them cb0db40 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Restore curated benchmark families; polish frontier panel UX ca20f78 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Precompute eval matrices for multi-metric + per-slice leaderboards 553b175 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Restore HF Open LLM v2 composite and dedup vals.ai aliases 6db4f51 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Sort evals list by family name; add sortable columns; use cleaned display names 919a75f evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Compute and apply cleaned benchmark counts per model c2e86ea evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Remove raw-hierarchy fallback — only ever serve cleaned hierarchy b5fa10d evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Harden cleanHierarchy fallback and add family-name filter chips 8529a4b evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Bump clean-hierarchy cache version to v10 to bust stale HF Space cache 4bf0591 evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago
Restructure model details + extend cleanHierarchy for split families and aggregator dedup 06313c1 evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago
Prefer /data persistent bucket for sidecar cache when available dc95237 evijit HF Staff Claude Opus 4.7 (1M context) commited on 23 days ago
Disk-cache snapshot sidecars to skip cold-start re-downloads 40339dc evijit HF Staff Claude Opus 4.7 (1M context) commited on 23 days ago
Route peer-ranks fetch through SNAPSHOT_URL sidecar 6cc7b0b evijit HF Staff Claude Opus 4.7 (1M context) commited on 23 days ago
Group model/eval-detail benchmarks by hierarchy.json families f073e7a evijit HF Staff commited on 23 days ago
Tighten eval cards UI and clean up stale local data 32864b0 evijit HF Staff Claude Opus 4.7 (1M context) commited on 24 days ago
Merge corpus dashboard into home as paper-aligned landing 5279156 evijit HF Staff Claude Opus 4.7 (1M context) commited on 27 days ago
Add DuckDB shadow-read backend with source-metadata fix 2fcae3f Jenny Chim Claude Opus 4.7 (1M context) commited on 30 days ago
Add interpretive signals, corpus dashboard, and slice browser bca888a evijit HF Staff Claude Opus 4.7 (1M context) commited on about 1 month ago
Preserve evaluator_relationship when flattening model hierarchy 431b0cc evijit HF Staff commited on Apr 15
Fix RewardBench2 key normalization for matrix leaderboard routing 8821e18 evijit HF Staff commited on Apr 14
Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0 evijit HF Staff commited on Apr 14
Add per-benchmark comparison histograms on model detail 415ac43 evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13
Add survey submission and update survey text for public use 516ec04 evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 8