Spaces:

evaleval
/

general-eval-card

Running

App Files Files Community

general-eval-card / scripts

Commit History

Add validated evaluator badge

478ae6c

j-chim commited on 23 days ago

Update readme for pre-push script

82e8fdb

j-chim commited on 25 days ago

Update about (#8)

599471d

j-chim commited on 27 days ago

Gate content: leaderboard-parity migration gate (query == comparison-index)

1d79b9e

j-chim Claude Opus 4.8 (1M context) commited on about 1 month ago

Resilience + linux gate: connection-failure reset + pre-push DuckDB read-path gate

3fd5483

j-chim Claude Opus 4.8 (1M context) commited on about 1 month ago

Warm sidecar cache and show loading progress

83ef54a

evijit HF Staff commited on about 1 month ago

Live snapshot date, hide empty Updated col, clean slice contamination

cb0ce7c

evijit HF Staff Claude Opus 4.7 (1M context) commited on May 6

Precompute eval matrices for multi-metric + per-slice leaderboards

553b175

evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5

Restructure model details + extend cleanHierarchy for split families and aggregator dedup

06313c1

evijit HF Staff Claude Opus 4.7 (1M context) commited on May 5

Swap backend data (#3)

fe99ffa

evijit HF Staff

j-chim commited on May 4

Restore DuckDB-aware build cache logic

154e1d8

j-chim Claude Opus 4.7 (1M context) commited on Apr 29

Sort eval-detail filenames by codepoint for parity

819e7c9

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 29

Deploy DuckDB-backed frontend to

da8db3e

Jenny Chim commited on Apr 29

Add three-tier test infrastructure for migration safety

d3cbe09

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 28

Add DuckDB shadow-read backend with source-metadata fix

2fcae3f

Jenny Chim Claude Opus 4.7 (1M context) commited on Apr 28

Add interpretive signals, corpus dashboard, and slice browser

bca888a

evijit HF Staff Claude Opus 4.7 (1M context) commited on Apr 27

Aggregate setup aliases and clarify benchmark variants

dd0b4fc

evijit HF Staff commited on Apr 14

Improve eval/model UX, lite data paths, and leaderboard clarity

436ada0

evijit HF Staff commited on Apr 14

Use HF dataset's peer-ranks.json instead of local recomputation

6a6446b

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13

Add per-benchmark comparison histograms on model detail

415ac43

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13

Harden aggregate evals and cache refresh

9d14977

evijit HF Staff commited on Apr 13

Refresh eval cards UI and backend data flow

c1f2130

evijit HF Staff commited on Apr 10

fix bugs

ae1dc39

evijit HF Staff commited on Apr 7

redesigned

3a12290

evijit HF Staff commited on Mar 27

fix data

ddfc163

Avijit Ghosh commited on Dec 16, 2025

Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page

2554366

Avijit Ghosh commited on Dec 16, 2025

new ux

6978d97

Avijit Ghosh commited on Dec 16, 2025

added all the new files

509e21e

evijit HF Staff commited on Aug 16, 2025

Commit History

Add validated evaluator badge 478ae6c

Update readme for pre-push script 82e8fdb

Update about (#8) 599471d

Gate content: leaderboard-parity migration gate (query == comparison-index) 1d79b9e

Resilience + linux gate: connection-failure reset + pre-push DuckDB read-path gate 3fd5483

Warm sidecar cache and show loading progress 83ef54a

Live snapshot date, hide empty Updated col, clean slice contamination cb0ce7c

Precompute eval matrices for multi-metric + per-slice leaderboards 553b175

Restructure model details + extend cleanHierarchy for split families and aggregator dedup 06313c1

Swap backend data (#3) fe99ffa

Restore DuckDB-aware build cache logic 154e1d8

Sort eval-detail filenames by codepoint for parity 819e7c9

Deploy DuckDB-backed frontend to da8db3e

Add three-tier test infrastructure for migration safety d3cbe09

Add DuckDB shadow-read backend with source-metadata fix 2fcae3f

Add interpretive signals, corpus dashboard, and slice browser bca888a

Aggregate setup aliases and clarify benchmark variants dd0b4fc

Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0

Use HF dataset's peer-ranks.json instead of local recomputation 6a6446b

Add per-benchmark comparison histograms on model detail 415ac43

Harden aggregate evals and cache refresh 9d14977

Refresh eval cards UI and backend data flow c1f2130

fix bugs ae1dc39

redesigned 3a12290

fix data ddfc163

Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page 2554366

new ux 6978d97

added all the new files 509e21e

Add validated evaluator badge

478ae6c

Update readme for pre-push script

82e8fdb

Update about (#8)

599471d

Gate content: leaderboard-parity migration gate (query == comparison-index)

1d79b9e

Resilience + linux gate: connection-failure reset + pre-push DuckDB read-path gate

3fd5483

Warm sidecar cache and show loading progress

83ef54a

Live snapshot date, hide empty Updated col, clean slice contamination

cb0ce7c

Precompute eval matrices for multi-metric + per-slice leaderboards

553b175

Restructure model details + extend cleanHierarchy for split families and aggregator dedup

06313c1

Swap backend data (#3)

fe99ffa

Restore DuckDB-aware build cache logic

154e1d8

Sort eval-detail filenames by codepoint for parity

819e7c9

Deploy DuckDB-backed frontend to

da8db3e

Add three-tier test infrastructure for migration safety

d3cbe09

Add DuckDB shadow-read backend with source-metadata fix

2fcae3f

Add interpretive signals, corpus dashboard, and slice browser

bca888a

Aggregate setup aliases and clarify benchmark variants

dd0b4fc

Improve eval/model UX, lite data paths, and leaderboard clarity

436ada0

Use HF dataset's peer-ranks.json instead of local recomputation

6a6446b

Add per-benchmark comparison histograms on model detail

415ac43

Harden aggregate evals and cache refresh

9d14977

Refresh eval cards UI and backend data flow

c1f2130

fix bugs

ae1dc39

redesigned

3a12290

fix data

ddfc163

Refactor: Update benchmarks with realistic data, fix UI stats, and improve About page

2554366

new ux

6978d97

added all the new files

509e21e