Spaces:

yananlong
/

general-eval-card

Sleeping

App Files Files Community

general-eval-card / lib

Commit History

fix joins

fbbeb45

Yanan Long commited on 22 days ago

Merge origin/main and integrate research joins

e50a33e

Yanan Long commited on 22 days ago

Bump clean-hierarchy cache version to v13 to drop stale blob

4d3de5c

evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago

Merge cross-source benchmark families; tidy leaderboard panel + table chrome

8ef4cbc

evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago

Drop alias-only single-bench families without merging them

cb0db40

evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago

Restore curated benchmark families; polish frontier panel UX

ca20f78

evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago

Precompute eval matrices for multi-metric + per-slice leaderboards

553b175

evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago

Restore HF Open LLM v2 composite and dedup vals.ai aliases

6db4f51

evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago

Add local parquet read support

aa29970

j-chim commited on 22 days ago

Sort evals list by family name; add sortable columns; use cleaned display names

919a75f

evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago

Dedup logic to counts

aac276a

j-chim commited on 22 days ago

Compute and apply cleaned benchmark counts per model

c2e86ea

evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago

Remove raw-hierarchy fallback — only ever serve cleaned hierarchy

b5fa10d

evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago

Harden cleanHierarchy fallback and add family-name filter chips

8529a4b

evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago

Bump clean-hierarchy cache version to v10 to bust stale HF Space cache

4bf0591

evijit HF Staff Claude Sonnet 4.6 commited on 22 days ago

Restructure model details + extend cleanHierarchy for split families and aggregator dedup

06313c1

evijit HF Staff Claude Opus 4.7 (1M context) commited on 22 days ago

Add option to purge cache

f2e3a0a

j-chim commited on 23 days ago

stats change

f816900

j-chim commited on 23 days ago

Prefer /data persistent bucket for sidecar cache when available

dc95237

evijit HF Staff Claude Opus 4.7 (1M context) commited on 23 days ago

Disk-cache snapshot sidecars to skip cold-start re-downloads

40339dc

evijit HF Staff Claude Opus 4.7 (1M context) commited on 23 days ago

Switch family/model views to curated category tags

bc08b3b

evijit HF Staff commited on 23 days ago

Route peer-ranks fetch through SNAPSHOT_URL sidecar

6cc7b0b

evijit HF Staff Claude Opus 4.7 (1M context) commited on 23 days ago

Hotfix: categories

a80dd9f

j-chim commited on 23 days ago

Group model/eval-detail benchmarks by hierarchy.json families

f073e7a

evijit HF Staff commited on 23 days ago

Refactor to align on benchmark hierarchy

2ed4959

j-chim commited on 23 days ago

Update with datafix v2

11542d9

j-chim commited on 23 days ago

Swap backend data (#3)

fe99ffa

evijit HF Staff

j-chim commited on 24 days ago

Tighten eval cards UI and clean up stale local data

32864b0

evijit HF Staff Claude Opus 4.7 (1M context) commited on 24 days ago

Add researcher join analysis to eval detail

08fde08

Yanan Long commited on 29 days ago

Merge corpus dashboard into home as paper-aligned landing

5279156

evijit HF Staff Claude Opus 4.7 (1M context) commited on 27 days ago

Deploy DuckDB-backed frontend to

da8db3e

Jenny Chim commited on 29 days ago

Add DuckDB shadow-read backend with source-metadata fix

2fcae3f

Jenny Chim Claude Opus 4.7 (1M context) commited on 30 days ago

Separate policy and researcher views

9b4cdbb

evijit HF Staff commited on 29 days ago

Add interpretive signals, corpus dashboard, and slice browser

bca888a

evijit HF Staff Claude Opus 4.7 (1M context) commited on about 1 month ago

Preserve evaluator_relationship when flattening model hierarchy

431b0cc

evijit HF Staff commited on Apr 15

improve ux

8058fce

evijit HF Staff commited on Apr 15

Differentiate audience modes and tighten eval navigation

d8c2856

evijit HF Staff commited on Apr 15

Aggregate setup aliases and clarify benchmark variants

dd0b4fc

evijit HF Staff commited on Apr 14

Fix RewardBench2 key normalization for matrix leaderboard routing

8821e18

evijit HF Staff commited on Apr 14

Improve eval/model UX, lite data paths, and leaderboard clarity

436ada0

evijit HF Staff commited on Apr 14

Add per-benchmark comparison histograms on model detail

415ac43

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 13

Improve eval score displays and summary fallbacks

bd8cbe8

evijit HF Staff commited on Apr 13

Refresh eval cards UI and backend data flow

c1f2130

evijit HF Staff commited on Apr 10

Add survey submission and update survey text for public use

516ec04

evijit HF Staff Claude Opus 4.6 (1M context) commited on Apr 8

fix bugs

ae1dc39

evijit HF Staff commited on Apr 7

fix bugs

04b4cff

evijit HF Staff commited on Apr 7

ux changes

5f59721

evijit HF Staff commited on Apr 6

Add survey

e7123f0

evijit HF Staff commited on Apr 6

fix: align reporting cues and developer slugs

5ca5561

evijit HF Staff commited on Mar 28

feat: refine model and benchmark exploration

03e2430

evijit HF Staff commited on Mar 28

Commit History

fix joins fbbeb45

Merge origin/main and integrate research joins e50a33e

Bump clean-hierarchy cache version to v13 to drop stale blob 4d3de5c

Merge cross-source benchmark families; tidy leaderboard panel + table chrome 8ef4cbc

Drop alias-only single-bench families without merging them cb0db40

Restore curated benchmark families; polish frontier panel UX ca20f78

Precompute eval matrices for multi-metric + per-slice leaderboards 553b175

Restore HF Open LLM v2 composite and dedup vals.ai aliases 6db4f51

Add local parquet read support aa29970

Sort evals list by family name; add sortable columns; use cleaned display names 919a75f

Dedup logic to counts aac276a

Compute and apply cleaned benchmark counts per model c2e86ea

Remove raw-hierarchy fallback — only ever serve cleaned hierarchy b5fa10d

Harden cleanHierarchy fallback and add family-name filter chips 8529a4b

Bump clean-hierarchy cache version to v10 to bust stale HF Space cache 4bf0591

Restructure model details + extend cleanHierarchy for split families and aggregator dedup 06313c1

Add option to purge cache f2e3a0a

stats change f816900

Prefer /data persistent bucket for sidecar cache when available dc95237

Disk-cache snapshot sidecars to skip cold-start re-downloads 40339dc

Switch family/model views to curated category tags bc08b3b

Route peer-ranks fetch through SNAPSHOT_URL sidecar 6cc7b0b

Hotfix: categories a80dd9f

Group model/eval-detail benchmarks by hierarchy.json families f073e7a

Refactor to align on benchmark hierarchy 2ed4959

Update with datafix v2 11542d9

Swap backend data (#3) fe99ffa

Tighten eval cards UI and clean up stale local data 32864b0

Add researcher join analysis to eval detail 08fde08

Merge corpus dashboard into home as paper-aligned landing 5279156

Deploy DuckDB-backed frontend to da8db3e

Add DuckDB shadow-read backend with source-metadata fix 2fcae3f

Separate policy and researcher views 9b4cdbb

Add interpretive signals, corpus dashboard, and slice browser bca888a

Preserve evaluator_relationship when flattening model hierarchy 431b0cc

improve ux 8058fce

Differentiate audience modes and tighten eval navigation d8c2856

Aggregate setup aliases and clarify benchmark variants dd0b4fc

Fix RewardBench2 key normalization for matrix leaderboard routing 8821e18

Improve eval/model UX, lite data paths, and leaderboard clarity 436ada0

Add per-benchmark comparison histograms on model detail 415ac43

Improve eval score displays and summary fallbacks bd8cbe8

Refresh eval cards UI and backend data flow c1f2130

Add survey submission and update survey text for public use 516ec04

fix bugs ae1dc39

fix bugs 04b4cff

ux changes 5f59721

Add survey e7123f0

fix: align reporting cues and developer slugs 5ca5561

feat: refine model and benchmark exploration 03e2430

fix joins

fbbeb45

Merge origin/main and integrate research joins

e50a33e

Bump clean-hierarchy cache version to v13 to drop stale blob

4d3de5c

Merge cross-source benchmark families; tidy leaderboard panel + table chrome

8ef4cbc

Drop alias-only single-bench families without merging them

cb0db40

Restore curated benchmark families; polish frontier panel UX

ca20f78

Precompute eval matrices for multi-metric + per-slice leaderboards

553b175

Restore HF Open LLM v2 composite and dedup vals.ai aliases

6db4f51

Add local parquet read support

aa29970

Sort evals list by family name; add sortable columns; use cleaned display names

919a75f

Dedup logic to counts

aac276a

Compute and apply cleaned benchmark counts per model

c2e86ea

Remove raw-hierarchy fallback — only ever serve cleaned hierarchy

b5fa10d

Harden cleanHierarchy fallback and add family-name filter chips

8529a4b

Bump clean-hierarchy cache version to v10 to bust stale HF Space cache

4bf0591

Restructure model details + extend cleanHierarchy for split families and aggregator dedup

06313c1

Add option to purge cache

f2e3a0a

stats change

f816900

Prefer /data persistent bucket for sidecar cache when available

dc95237

Disk-cache snapshot sidecars to skip cold-start re-downloads

40339dc

Switch family/model views to curated category tags

bc08b3b

Route peer-ranks fetch through SNAPSHOT_URL sidecar

6cc7b0b

Hotfix: categories

a80dd9f

Group model/eval-detail benchmarks by hierarchy.json families

f073e7a

Refactor to align on benchmark hierarchy

2ed4959

Update with datafix v2

11542d9

Swap backend data (#3)

fe99ffa

Tighten eval cards UI and clean up stale local data

32864b0

Add researcher join analysis to eval detail

08fde08

Merge corpus dashboard into home as paper-aligned landing

5279156

Deploy DuckDB-backed frontend to

da8db3e

Add DuckDB shadow-read backend with source-metadata fix

2fcae3f

Separate policy and researcher views

9b4cdbb

Add interpretive signals, corpus dashboard, and slice browser

bca888a

Preserve evaluator_relationship when flattening model hierarchy

431b0cc

improve ux

8058fce

Differentiate audience modes and tighten eval navigation

d8c2856

Aggregate setup aliases and clarify benchmark variants

dd0b4fc

Fix RewardBench2 key normalization for matrix leaderboard routing

8821e18

Improve eval/model UX, lite data paths, and leaderboard clarity

436ada0

Add per-benchmark comparison histograms on model detail

415ac43

Improve eval score displays and summary fallbacks

bd8cbe8

Refresh eval cards UI and backend data flow

c1f2130

Add survey submission and update survey text for public use

516ec04

fix bugs

ae1dc39

fix bugs

04b4cff

ux changes

5f59721

Add survey

e7123f0

fix: align reporting cues and developer slugs

5ca5561

feat: refine model and benchmark exploration

03e2430