Spaces:
Running
Running
Commit History
Precompute eval matrices for multi-metric + per-slice leaderboards 553b175
Restructure model details + extend cleanHierarchy for split families and aggregator dedup 06313c1
Restore DuckDB-aware build cache logic 154e1d8
Sort eval-detail filenames by codepoint for parity 819e7c9
Jenny Chim Claude Opus 4.7 (1M context) commited on
Deploy DuckDB-backed frontend to da8db3e
Jenny Chim commited on
Add three-tier test infrastructure for migration safety d3cbe09
Jenny Chim Claude Opus 4.7 (1M context) commited on
Add DuckDB shadow-read backend with source-metadata fix 2fcae3f
Jenny Chim Claude Opus 4.7 (1M context) commited on
Add interpretive signals, corpus dashboard, and slice browser bca888a
Aggregate setup aliases and clarify benchmark variants dd0b4fc
Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0
Use HF dataset's peer-ranks.json instead of local recomputation 6a6446b
Add per-benchmark comparison histograms on model detail 415ac43
Harden aggregate evals and cache refresh 9d14977
Refresh eval cards UI and backend data flow c1f2130
fix bugs ae1dc39
redesigned 3a12290
fix data ddfc163
Avijit Ghosh commited on
Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page 2554366
Avijit Ghosh commited on
new ux 6978d97
Avijit Ghosh commited on