Spaces:
Running on CPU Spr
Running on CPU Spr
Commit History
Merge remote-tracking branch 'origin/main' into feat/use-new-backend-data 25ba6d0
Tighten eval cards UI and clean up stale local data 32864b0
Drop input_modalities/output_modalities from MODEL_CARD_COLUMNS bfce8f2
Integrate with test backend data 7635aee
Add new component files and align app to EvalEval design system dbdd6d1
Replace shadcn-styled UI elements with design system primitives 187ffe6
Remove unused public/peer-ranks.json 2d949cf
Add plain-language captions and mode-aware framing for policy readers 3ad47c6
Align user-facing labels with paper terminology 4be62f9
Merge corpus dashboard into home as paper-aligned landing 5279156
Bake DuckDB envs into runner stage 051fa16
Restore DuckDB-aware build cache logic 154e1d8
Point Dockerfile at production card_backend dataset db192b0
Fix Fibble Arena (and similar) suite link routing c569d0f
Sort eval-detail filenames by codepoint for parity 819e7c9
Jenny Chim Claude Opus 4.7 (1M context) commited on
Set LOCAL_PIPELINE_OUTPUT/HF_DATA_OFFLINE at Docker build time fe5af86
Jenny Chim Claude Opus 4.7 (1M context) commited on
Bake DuckDB build-time defaults into Dockerfile 34ddba0
Jenny Chim Claude Opus 4.7 (1M context) commited on
Deploy DuckDB-backed frontend to da8db3e
Jenny Chim commited on
Add three-tier test infrastructure for migration safety d3cbe09
Jenny Chim Claude Opus 4.7 (1M context) commited on
Add DuckDB shadow-read backend with source-metadata fix 2fcae3f
Jenny Chim Claude Opus 4.7 (1M context) commited on
Separate policy and researcher views 9b4cdbb
Add interpretive signals, corpus dashboard, and slice browser bca888a
Preserve evaluator_relationship when flattening model hierarchy 431b0cc
improve ux 8058fce
Differentiate audience modes and tighten eval navigation d8c2856
Aggregate setup aliases and clarify benchmark variants dd0b4fc
Fix RewardBench2 key normalization for matrix leaderboard routing 8821e18
Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0
Improve homepage loading and eval grouping 26a0d2d
Use HF dataset's peer-ranks.json instead of local recomputation 6a6446b
Add per-benchmark comparison histograms on model detail 415ac43
Add site favicon metadata 35729f5
Improve eval score displays and summary fallbacks bd8cbe8
Harden aggregate evals and cache refresh 9d14977
Refine evaluation browsing UX a0dd44e
Ignore local cache and data artifacts 0e12c7f
Refresh eval cards UI and backend data flow c1f2130
Fix survey submit: use correct HF commit API JSON format with files array c5372a8
Fix survey submit: use multipart form data for HF commit API 9481599
Add alert feedback on survey submit success/failure 872607f
Fix HF commit API field: summary not commit_message 023694a
Fix survey submission: use HF commit API instead of deprecated upload ddf16f4
Add survey submission and update survey text for public use 516ec04
fix bugs 29afc21
fix bugs ae1dc39
fix bugs 04b4cff
chore: sync EEE pipeline output [2026-04-07 05:13 UTC] ddebd57
GitHub Actions commited on