NowAI-Bench / tasks /todo.md
bradnow's picture
Add title animation demo with hover effects and accessibility considerations
df7f292
|
Raw
History Blame Contribute Delete
2.33 kB

Leaderboard Card Fixes β€” Task List

Source: annotated mockup review (2026-06-02). Targets index.html lines 161–292.

EnterpriseOps-Gym card

  • T1 β€” Clickable title. Wrap <h3>EnterpriseOps-Gym</h3> (L165) in a link to the EOG webpage. Add hover-over info.
  • T2 β€” Drop "cascade". Anthropic Β· cascade β†’ Anthropic (L171).
  • T3 β€” Rename metric label. Success rate Β· Oracle mode β†’ Task Success Rate Β· Oracle mode (L175).
  • T4 β€” Metric hover. Add hover-over def on the metric: "A task passes only if all verification conditions are met."

EVA-Bench card

  • T5 β€” Clickable title. Wrap <h3>EVA-Bench</h3> (L217) in a link to the EVA webpage. Add hover-over info.
  • T6 β€” Accuracy section label. Accuracy β†’ EVA-Accuracy (L224).
  • T7 β€” Accuracy metric tag. EVA-A Β· PASS@3 β†’ Pass@1 (L225).
  • T8 β€” Accuracy metric hover. Def: "Scores for accuracy. All values normalized to 0–1 (higher is better). 95% bootstrap confidence intervals shown for each value."
  • T9 β€” Cascade subtitle. cascade Β· mixed (L230) β€” keep as-is (annotation just maps it to "Mixed Models | Cascade"; current value already correct). Confirm no change.
  • T10 β€” Experience section label. Experience β†’ EVA-Experience (L256).
  • T11 β€” Experience metric tag. EVA-X Β· PASS@3 β†’ Pass@1 (L257).
  • T12 β€” Experience metric hover. Def: "Scores for conversational experience. All values normalized to 0–1 (higher is better). 95% bootstrap confidence intervals shown for each value."
  • T13 β€” S2S subtitle. Google Β· S2S (L262) β€” keep as-is (annotation maps it to "Google | Speech-to-Speech"; current already correct). Confirm no change.

Open questions (need answers before implementing)

  1. Title link URLs. Use same as the "View full leaderboard" links? EOG β†’ https://enterpriseops-gym.github.io/, EVA β†’ https://servicenow.github.io/eva/. Confirm.
  2. Hover mechanism. Native title="" attribute (simple, less styled) vs custom CSS/JS tooltip (accessible, on-brand, more work)? Recommend custom accessible tooltip.
  3. T7/T11. Confirm full replace of EVA-A Β· PASS@3 / EVA-X Β· PASS@3 with just Pass@1 (drops the EVA-A/EVA-X code and changes @3 β†’ @1).