Spaces:
Running
Running
Leaderboard Card Fixes β Task List
Source: annotated mockup review (2026-06-02). Targets index.html lines 161β292.
EnterpriseOps-Gym card
- T1 β Clickable title. Wrap
<h3>EnterpriseOps-Gym</h3>(L165) in a link to the EOG webpage. Add hover-over info. - T2 β Drop "cascade".
Anthropic Β· cascadeβAnthropic(L171). - T3 β Rename metric label.
Success rate Β· Oracle modeβTask Success Rate Β· Oracle mode(L175). - T4 β Metric hover. Add hover-over def on the metric: "A task passes only if all verification conditions are met."
EVA-Bench card
- T5 β Clickable title. Wrap
<h3>EVA-Bench</h3>(L217) in a link to the EVA webpage. Add hover-over info. - T6 β Accuracy section label.
AccuracyβEVA-Accuracy(L224). - T7 β Accuracy metric tag.
EVA-A Β· PASS@3βPass@1(L225). - T8 β Accuracy metric hover. Def: "Scores for accuracy. All values normalized to 0β1 (higher is better). 95% bootstrap confidence intervals shown for each value."
- T9 β Cascade subtitle.
cascade Β· mixed(L230) β keep as-is (annotation just maps it to "Mixed Models | Cascade"; current value already correct). Confirm no change. - T10 β Experience section label.
ExperienceβEVA-Experience(L256). - T11 β Experience metric tag.
EVA-X Β· PASS@3βPass@1(L257). - T12 β Experience metric hover. Def: "Scores for conversational experience. All values normalized to 0β1 (higher is better). 95% bootstrap confidence intervals shown for each value."
- T13 β S2S subtitle.
Google Β· S2S(L262) β keep as-is (annotation maps it to "Google | Speech-to-Speech"; current already correct). Confirm no change.
Open questions (need answers before implementing)
- Title link URLs. Use same as the "View full leaderboard" links? EOG β
https://enterpriseops-gym.github.io/, EVA βhttps://servicenow.github.io/eva/. Confirm. - Hover mechanism. Native
title=""attribute (simple, less styled) vs custom CSS/JS tooltip (accessible, on-brand, more work)? Recommend custom accessible tooltip. - T7/T11. Confirm full replace of
EVA-A Β· PASS@3/EVA-X Β· PASS@3with justPass@1(drops the EVA-A/EVA-X code and changes @3 β @1).