NowAI-Bench / tasks /todo.md
bradnow's picture
Add title animation demo with hover effects and accessibility considerations
df7f292
|
Raw
History Blame Contribute Delete
2.33 kB
# Leaderboard Card Fixes β€” Task List
Source: annotated mockup review (2026-06-02). Targets `index.html` lines 161–292.
## EnterpriseOps-Gym card
- [ ] **T1 β€” Clickable title.** Wrap `<h3>EnterpriseOps-Gym</h3>` (L165) in a link to the EOG webpage. Add hover-over info.
- [ ] **T2 β€” Drop "cascade".** `Anthropic Β· cascade` β†’ `Anthropic` (L171).
- [ ] **T3 β€” Rename metric label.** `Success rate Β· Oracle mode` β†’ `Task Success Rate Β· Oracle mode` (L175).
- [ ] **T4 β€” Metric hover.** Add hover-over def on the metric: "A task passes only if all verification conditions are met."
## EVA-Bench card
- [ ] **T5 β€” Clickable title.** Wrap `<h3>EVA-Bench</h3>` (L217) in a link to the EVA webpage. Add hover-over info.
- [ ] **T6 β€” Accuracy section label.** `Accuracy` β†’ `EVA-Accuracy` (L224).
- [ ] **T7 β€” Accuracy metric tag.** `EVA-A Β· PASS@3` β†’ `Pass@1` (L225).
- [ ] **T8 β€” Accuracy metric hover.** Def: "Scores for accuracy. All values normalized to 0–1 (higher is better). 95% bootstrap confidence intervals shown for each value."
- [ ] **T9 β€” Cascade subtitle.** `cascade Β· mixed` (L230) β€” keep as-is (annotation just maps it to "Mixed Models | Cascade"; current value already correct). Confirm no change.
- [ ] **T10 β€” Experience section label.** `Experience` β†’ `EVA-Experience` (L256).
- [ ] **T11 β€” Experience metric tag.** `EVA-X Β· PASS@3` β†’ `Pass@1` (L257).
- [ ] **T12 β€” Experience metric hover.** Def: "Scores for conversational experience. All values normalized to 0–1 (higher is better). 95% bootstrap confidence intervals shown for each value."
- [ ] **T13 β€” S2S subtitle.** `Google Β· S2S` (L262) β€” keep as-is (annotation maps it to "Google | Speech-to-Speech"; current already correct). Confirm no change.
## Open questions (need answers before implementing)
1. **Title link URLs.** Use same as the "View full leaderboard" links? EOG β†’ `https://enterpriseops-gym.github.io/`, EVA β†’ `https://servicenow.github.io/eva/`. Confirm.
2. **Hover mechanism.** Native `title=""` attribute (simple, less styled) vs custom CSS/JS tooltip (accessible, on-brand, more work)? Recommend custom accessible tooltip.
3. **T7/T11.** Confirm full replace of `EVA-A Β· PASS@3` / `EVA-X Β· PASS@3` with just `Pass@1` (drops the EVA-A/EVA-X code and changes @3 β†’ @1).