Spaces:

evaleval
/

general-eval-card

Running on CPU Spr

App Files Files Community

general-eval-card

Commit History

Use model_key as the addressable identifier and wire comparison-index sidecar

0e529dc

j-chim commited on May 4

Merge remote-tracking branch 'origin/main' into feat/use-new-backend-data

25ba6d0

j-chim commited on May 3

Tighten eval cards UI and clean up stale local data

32864b0

evijit HF Staff Claude Opus 4.7 (1M context) commited on May 3

Drop input_modalities/output_modalities from MODEL_CARD_COLUMNS

bfce8f2

j-chim Claude Opus 4.7 (1M context) commited on May 3

Integrate with test backend data

7635aee

j-chim commited on May 3

Add new component files and align app to EvalEval design system

dbdd6d1

evijit HF Staff Claude Sonnet 4.6 commited on May 3

Replace shadcn-styled UI elements with design system primitives

187ffe6

evijit HF Staff Claude Sonnet 4.6 commited on May 3

Remove unused public/peer-ranks.json

2d949cf

evijit HF Staff Claude Opus 4.7 (1M context) commited on Apr 30

Add plain-language captions and mode-aware framing for policy readers

3ad47c6

evijit HF Staff Claude Opus 4.7 (1M context) commited on Apr 30

Align user-facing labels with paper terminology

4be62f9

evijit HF Staff Claude Opus 4.7 (1M context) commited on Apr 30

Merge corpus dashboard into home as paper-aligned landing

5279156

evijit HF Staff Claude Opus 4.7 (1M context) commited on Apr 30

Bake DuckDB envs into runner stage

051fa16

j-chim Claude Opus 4.7 (1M context) commited on Apr 29

Restore DuckDB-aware build cache logic

154e1d8

j-chim Claude Opus 4.7 (1M context) commited on Apr 29

Point Dockerfile at production card_backend dataset

db192b0

j-chim Claude Opus 4.7 (1M context) commited on Apr 29

Fix Fibble Arena (and similar) suite link routing

c569d0f

j-chim Claude Opus 4.7 (1M context) commited on Apr 29

Sort eval-detail filenames by codepoint for parity

819e7c9

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 29

Set LOCAL_PIPELINE_OUTPUT/HF_DATA_OFFLINE at Docker build time

fe5af86

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 29

Bake DuckDB build-time defaults into Dockerfile

34ddba0

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 29

Deploy DuckDB-backed frontend to

da8db3e

Jenny Chim commited on Apr 29

Add three-tier test infrastructure for migration safety

d3cbe09

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 28

Add DuckDB shadow-read backend with source-metadata fix

2fcae3f

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 28

Separate policy and researcher views

9b4cdbb

evijit HF Staff commited on Apr 29

Add interpretive signals, corpus dashboard, and slice browser

bca888a

evijit HF Staff Claude Opus 4.7 (1M context) commited on Apr 27

Preserve evaluator_relationship when flattening model hierarchy

431b0cc

evijit HF Staff commited on Apr 15

improve ux

8058fce

evijit HF Staff commited on Apr 15

Differentiate audience modes and tighten eval navigation

d8c2856

evijit HF Staff commited on Apr 15

Aggregate setup aliases and clarify benchmark variants

dd0b4fc

evijit HF Staff commited on Apr 14

Fix RewardBench2 key normalization for matrix leaderboard routing

8821e18

evijit HF Staff commited on Apr 14

Improve eval/model UX, lite data paths, and leaderboard clarity

436ada0

evijit HF Staff commited on Apr 14

Improve homepage loading and eval grouping

26a0d2d

evijit HF Staff commited on Apr 14

Use HF dataset's peer-ranks.json instead of local recomputation

6a6446b

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13

Add per-benchmark comparison histograms on model detail

415ac43

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13

Add site favicon metadata

35729f5

evijit HF Staff commited on Apr 13

Improve eval score displays and summary fallbacks

bd8cbe8

evijit HF Staff commited on Apr 13

Harden aggregate evals and cache refresh

9d14977

evijit HF Staff commited on Apr 13

Refine evaluation browsing UX

a0dd44e

evijit HF Staff commited on Apr 13

Ignore local cache and data artifacts

0e12c7f

evijit HF Staff commited on Apr 10