openhands openhands commited on
Commit
bb0f7af
·
1 Parent(s): 4ab5f97

docs: Update descriptive text to use Average Score and Total Cost

Browse files

- Changed 'Overall score' to 'Average score' in content.py and about.py
- Changed 'Overall cost' to 'Total cost' in content.py and about.py
- Updated Total cost description to reflect that it's a sum, not an average

Co-authored-by: openhands <openhands@all-hands.dev>

Files changed (2) hide show
  1. about.py +2 -2
  2. content.py +2 -2
about.py CHANGED
@@ -62,8 +62,8 @@ def build_page():
62
  The OpenHands Index Overall Leaderboard provides a high-level view of agent performance and efficiency:
63
  </p>
64
  <ul class="info-list">
65
- <li><strong>Overall score</strong>: A macro-average across all benchmarks (equal weighting)</li>
66
- <li><strong>Overall cost</strong>: Average cost per task in USD, aggregated across benchmarks with reported cost</li>
67
  </ul>
68
  <p>
69
  Individual benchmark pages provide:
 
62
  The OpenHands Index Overall Leaderboard provides a high-level view of agent performance and efficiency:
63
  </p>
64
  <ul class="info-list">
65
+ <li><strong>Average score</strong>: A macro-average across all benchmarks (equal weighting)</li>
66
+ <li><strong>Total cost</strong>: Sum of costs across all categories, in USD</li>
67
  </ul>
68
  <p>
69
  Individual benchmark pages provide:
content.py CHANGED
@@ -26,10 +26,10 @@ INTRO_PARAGRAPH = """
26
 
27
  <ul class="info-list">
28
  <li>
29
- <strong>Overall score:</strong> A macro-average of the five category-level average scores. Each category contributes equally, regardless of how many benchmarks it includes. This ensures fair comparisons across agents with different domain strengths.
30
  </li>
31
  <li>
32
- <strong>Overall cost:</strong> A macro-average of the agent’s cost per problem across all categories, in USD. Each category contributes equally.
33
  </li>
34
  </ul>
35
 
 
26
 
27
  <ul class="info-list">
28
  <li>
29
+ <strong>Average score:</strong> A macro-average of the five category-level average scores. Each category contributes equally, regardless of how many benchmarks it includes. This ensures fair comparisons across agents with different domain strengths.
30
  </li>
31
  <li>
32
+ <strong>Total cost:</strong> A macro-average of the agent’s cost per problem across all categories, in USD. Each category contributes equally.
33
  </li>
34
  </ul>
35