Spaces:
Running
Running
| # Leaderboard Card Fixes β Task List | |
| Source: annotated mockup review (2026-06-02). Targets `index.html` lines 161β292. | |
| ## EnterpriseOps-Gym card | |
| - [ ] **T1 β Clickable title.** Wrap `<h3>EnterpriseOps-Gym</h3>` (L165) in a link to the EOG webpage. Add hover-over info. | |
| - [ ] **T2 β Drop "cascade".** `Anthropic Β· cascade` β `Anthropic` (L171). | |
| - [ ] **T3 β Rename metric label.** `Success rate Β· Oracle mode` β `Task Success Rate Β· Oracle mode` (L175). | |
| - [ ] **T4 β Metric hover.** Add hover-over def on the metric: "A task passes only if all verification conditions are met." | |
| ## EVA-Bench card | |
| - [ ] **T5 β Clickable title.** Wrap `<h3>EVA-Bench</h3>` (L217) in a link to the EVA webpage. Add hover-over info. | |
| - [ ] **T6 β Accuracy section label.** `Accuracy` β `EVA-Accuracy` (L224). | |
| - [ ] **T7 β Accuracy metric tag.** `EVA-A Β· PASS@3` β `Pass@1` (L225). | |
| - [ ] **T8 β Accuracy metric hover.** Def: "Scores for accuracy. All values normalized to 0β1 (higher is better). 95% bootstrap confidence intervals shown for each value." | |
| - [ ] **T9 β Cascade subtitle.** `cascade Β· mixed` (L230) β keep as-is (annotation just maps it to "Mixed Models | Cascade"; current value already correct). Confirm no change. | |
| - [ ] **T10 β Experience section label.** `Experience` β `EVA-Experience` (L256). | |
| - [ ] **T11 β Experience metric tag.** `EVA-X Β· PASS@3` β `Pass@1` (L257). | |
| - [ ] **T12 β Experience metric hover.** Def: "Scores for conversational experience. All values normalized to 0β1 (higher is better). 95% bootstrap confidence intervals shown for each value." | |
| - [ ] **T13 β S2S subtitle.** `Google Β· S2S` (L262) β keep as-is (annotation maps it to "Google | Speech-to-Speech"; current already correct). Confirm no change. | |
| ## Open questions (need answers before implementing) | |
| 1. **Title link URLs.** Use same as the "View full leaderboard" links? EOG β `https://enterpriseops-gym.github.io/`, EVA β `https://servicenow.github.io/eva/`. Confirm. | |
| 2. **Hover mechanism.** Native `title=""` attribute (simple, less styled) vs custom CSS/JS tooltip (accessible, on-brand, more work)? Recommend custom accessible tooltip. | |
| 3. **T7/T11.** Confirm full replace of `EVA-A Β· PASS@3` / `EVA-X Β· PASS@3` with just `Pass@1` (drops the EVA-A/EVA-X code and changes @3 β @1). | |