Spaces:

evaleval
/

general-eval-card

Running

App Files Files Community

general-eval-card / lib

Commit History

Use model_key as the addressable identifier and wire comparison-index sidecar

0e529dc

j-chim commited on 27 days ago

Merge remote-tracking branch 'origin/main' into feat/use-new-backend-data

25ba6d0

j-chim commited on 27 days ago

Tighten eval cards UI and clean up stale local data

32864b0

evijit HF Staff Claude Opus 4.7 (1M context) commited on 27 days ago

Drop input_modalities/output_modalities from MODEL_CARD_COLUMNS

bfce8f2

j-chim Claude Opus 4.7 (1M context) commited on 27 days ago

Integrate with test backend data

7635aee

j-chim commited on 27 days ago

Merge corpus dashboard into home as paper-aligned landing

5279156

evijit HF Staff Claude Opus 4.7 (1M context) commited on about 1 month ago

Deploy DuckDB-backed frontend to

da8db3e

Jenny Chim commited on Apr 29

Add DuckDB shadow-read backend with source-metadata fix

2fcae3f

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 28

Separate policy and researcher views

9b4cdbb

evijit HF Staff commited on Apr 29

Add interpretive signals, corpus dashboard, and slice browser

bca888a

evijit HF Staff Claude Opus 4.7 (1M context) commited on Apr 27

Preserve evaluator_relationship when flattening model hierarchy

431b0cc

evijit HF Staff commited on Apr 15

improve ux

8058fce

evijit HF Staff commited on Apr 15

Differentiate audience modes and tighten eval navigation

d8c2856

evijit HF Staff commited on Apr 15

Aggregate setup aliases and clarify benchmark variants

dd0b4fc

evijit HF Staff commited on Apr 14

Fix RewardBench2 key normalization for matrix leaderboard routing

8821e18

evijit HF Staff commited on Apr 14

Improve eval/model UX, lite data paths, and leaderboard clarity

436ada0

evijit HF Staff commited on Apr 14

Add per-benchmark comparison histograms on model detail

415ac43

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13

Improve eval score displays and summary fallbacks

bd8cbe8

evijit HF Staff commited on Apr 13

Refresh eval cards UI and backend data flow

c1f2130

evijit HF Staff commited on Apr 10

Add survey submission and update survey text for public use

516ec04

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 8

fix bugs

ae1dc39

evijit HF Staff commited on Apr 7

fix bugs

04b4cff

evijit HF Staff commited on Apr 7

ux changes

5f59721

evijit HF Staff commited on Apr 6

Add survey

e7123f0

evijit HF Staff commited on Apr 6

fix: align reporting cues and developer slugs

5ca5561

evijit HF Staff commited on Mar 28

feat: refine model and benchmark exploration

03e2430

evijit HF Staff commited on Mar 28

redesigned

3a12290

evijit HF Staff commited on Mar 27

fix data

ddfc163

Avijit Ghosh commited on Dec 16, 2025

Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page

2554366

Avijit Ghosh commited on Dec 16, 2025

new ux

6978d97

Avijit Ghosh commited on Dec 16, 2025

fixed a lot of bugs, centralized schema

49d5ba7

evijit HF Staff commited on Aug 17, 2025

unit tests added

a58dac7

evijit HF Staff commited on Aug 16, 2025

added all the new files

509e21e

evijit HF Staff commited on Aug 16, 2025

Commit History

Use model_key as the addressable identifier and wire comparison-index sidecar 0e529dc

Merge remote-tracking branch 'origin/main' into feat/use-new-backend-data 25ba6d0

Tighten eval cards UI and clean up stale local data 32864b0

Drop input_modalities/output_modalities from MODEL_CARD_COLUMNS bfce8f2

Integrate with test backend data 7635aee

Merge corpus dashboard into home as paper-aligned landing 5279156

Deploy DuckDB-backed frontend to da8db3e

Add DuckDB shadow-read backend with source-metadata fix 2fcae3f

Separate policy and researcher views 9b4cdbb

Add interpretive signals, corpus dashboard, and slice browser bca888a

Preserve evaluator_relationship when flattening model hierarchy 431b0cc

improve ux 8058fce

Differentiate audience modes and tighten eval navigation d8c2856

Aggregate setup aliases and clarify benchmark variants dd0b4fc

Fix RewardBench2 key normalization for matrix leaderboard routing 8821e18

Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0

Add per-benchmark comparison histograms on model detail 415ac43

Improve eval score displays and summary fallbacks bd8cbe8

Refresh eval cards UI and backend data flow c1f2130

Add survey submission and update survey text for public use 516ec04

fix bugs ae1dc39

fix bugs 04b4cff

ux changes 5f59721

Add survey e7123f0

fix: align reporting cues and developer slugs 5ca5561

feat: refine model and benchmark exploration 03e2430

redesigned 3a12290

fix data ddfc163

Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page 2554366

new ux 6978d97

fixed a lot of bugs, centralized schema 49d5ba7

unit tests added a58dac7

added all the new files 509e21e

Use model_key as the addressable identifier and wire comparison-index sidecar

0e529dc

Merge remote-tracking branch 'origin/main' into feat/use-new-backend-data

25ba6d0

Tighten eval cards UI and clean up stale local data

32864b0

Drop input_modalities/output_modalities from MODEL_CARD_COLUMNS

bfce8f2

Integrate with test backend data

7635aee

Merge corpus dashboard into home as paper-aligned landing

5279156

Deploy DuckDB-backed frontend to

da8db3e

Add DuckDB shadow-read backend with source-metadata fix

2fcae3f

Separate policy and researcher views

9b4cdbb

Add interpretive signals, corpus dashboard, and slice browser

bca888a

Preserve evaluator_relationship when flattening model hierarchy

431b0cc

improve ux

8058fce

Differentiate audience modes and tighten eval navigation

d8c2856

Aggregate setup aliases and clarify benchmark variants

dd0b4fc

Fix RewardBench2 key normalization for matrix leaderboard routing

8821e18

Improve eval/model UX, lite data paths, and leaderboard clarity

436ada0

Add per-benchmark comparison histograms on model detail

415ac43

Improve eval score displays and summary fallbacks

bd8cbe8

Refresh eval cards UI and backend data flow

c1f2130

Add survey submission and update survey text for public use

516ec04

fix bugs

ae1dc39

fix bugs

04b4cff

ux changes

5f59721

Add survey

e7123f0

fix: align reporting cues and developer slugs

5ca5561

feat: refine model and benchmark exploration

03e2430

redesigned

3a12290

fix data

ddfc163

Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page

2554366

new ux

6978d97

fixed a lot of bugs, centralized schema

49d5ba7

unit tests added

a58dac7

added all the new files

509e21e