Live snapshot date, hide empty Updated col, clean slice contamination cb0ce7c evijit HF Staff Claude Opus 4.7 (1M context) commited on 17 days ago
Precompute eval matrices for multi-metric + per-slice leaderboards 553b175 evijit HF Staff Claude Opus 4.7 (1M context) commited on 17 days ago
Restructure model details + extend cleanHierarchy for split families and aggregator dedup 06313c1 evijit HF Staff Claude Opus 4.7 (1M context) commited on 17 days ago
Restore DuckDB-aware build cache logic 154e1d8 j-chim Claude Opus 4.7 (1M context) commited on 24 days ago
Sort eval-detail filenames by codepoint for parity 819e7c9 Jenny Chim Claude Opus 4.7 (1M context) commited on 24 days ago
Add three-tier test infrastructure for migration safety d3cbe09 Jenny Chim Claude Opus 4.7 (1M context) commited on 25 days ago
Add DuckDB shadow-read backend with source-metadata fix 2fcae3f Jenny Chim Claude Opus 4.7 (1M context) commited on 25 days ago
Add interpretive signals, corpus dashboard, and slice browser bca888a evijit HF Staff Claude Opus 4.7 (1M context) commited on 25 days ago
Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0 evijit HF Staff commited on Apr 14
Use HF dataset's peer-ranks.json instead of local recomputation 6a6446b evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13
Add per-benchmark comparison histograms on model detail 415ac43 evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13
Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page 2554366 Avijit Ghosh commited on Dec 16, 2025