Spaces:
Running
Running
Commit History
Merge remote-tracking branch 'origin/main' into feat/use-new-backend-data 25ba6d0
Tighten eval cards UI and clean up stale local data 32864b0
Drop input_modalities/output_modalities from MODEL_CARD_COLUMNS bfce8f2
Integrate with test backend data 7635aee
Merge corpus dashboard into home as paper-aligned landing 5279156
Deploy DuckDB-backed frontend to da8db3e
Jenny Chim commited on
Add DuckDB shadow-read backend with source-metadata fix 2fcae3f
Jenny Chim Claude Opus 4.7 (1M context) commited on
Separate policy and researcher views 9b4cdbb
Add interpretive signals, corpus dashboard, and slice browser bca888a
Preserve evaluator_relationship when flattening model hierarchy 431b0cc
improve ux 8058fce
Differentiate audience modes and tighten eval navigation d8c2856
Aggregate setup aliases and clarify benchmark variants dd0b4fc
Fix RewardBench2 key normalization for matrix leaderboard routing 8821e18
Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0
Add per-benchmark comparison histograms on model detail 415ac43
Improve eval score displays and summary fallbacks bd8cbe8
Refresh eval cards UI and backend data flow c1f2130
Add survey submission and update survey text for public use 516ec04
fix bugs ae1dc39
fix bugs 04b4cff
ux changes 5f59721
Add survey e7123f0
fix: align reporting cues and developer slugs 5ca5561
feat: refine model and benchmark exploration 03e2430
redesigned 3a12290
fix data ddfc163
Avijit Ghosh commited on
Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page 2554366
Avijit Ghosh commited on
new ux 6978d97
Avijit Ghosh commited on