"""Alternative Agents leaderboard page. The canonical OpenHands Index leaderboard (Home + the per-category pages) ranks default OpenHands agent runs from ``results/{model}/`` in the openhands-index-results repo. Third-party harnesses (Claude Code, Codex, Gemini CLI, OpenHands Sub-agents, ...) live under ``alternative_agents/{type}/{model}/`` and aren't directly comparable to default OpenHands runs (different scaffolds, different cost/runtime characteristics), so they get their own standalone page instead of being mixed into the same ranking. This page is intentionally a single Overall view (no per-category subpages) — the alternative-agents dataset is small (one row per harness × model) and the goal is "show me all the alternatives at a glance", not "drill into Issue Resolution for Codex". To make same-model comparisons easier, the page also appends canonical OpenHands rows for any language model that appears in the alternative agent dataset. The match is exact, so ``Gemini-3-Pro`` and ``Gemini-3.1-Pro`` remain distinct entries. """ import matplotlib matplotlib.use('Agg') import pandas as pd import gradio as gr from simple_data_loader import SimpleLeaderboardViewer from ui_components import ( create_leaderboard_display, get_full_leaderboard_data, ) ALTERNATIVE_AGENTS_INTRO = """
Third-party agent harnesses running the OpenHands Index benchmarks. To make direct comparisons easier, this page also includes the canonical OpenHands row whenever the exact same language model appears under an alternative harness. Cost and runtime numbers still come from each harness's own instrumentation and aren't directly comparable across harnesses.