Spaces:

OpenHands
/

openhands-index

Running

openhands openhands commited on 30 days ago

Commit

bb0f7af

1 Parent(s): 4ab5f97

docs: Update descriptive text to use Average Score and Total Cost

- Changed 'Overall score' to 'Average score' in content.py and about.py
- Changed 'Overall cost' to 'Total cost' in content.py and about.py
- Updated Total cost description to reflect that it's a sum, not an average

Co-authored-by: openhands <openhands@all-hands.dev>

Files changed (2) hide show

about.py +2 -2
content.py +2 -2

about.py CHANGED Viewed

@@ -62,8 +62,8 @@ def build_page():
                 The OpenHands Index Overall Leaderboard provides a high-level view of agent performance and efficiency:
             </p>
             <ul class="info-list">
-                <li><strong>Overall score</strong>: A macro-average across all benchmarks (equal weighting)</li>
-                <li><strong>Overall cost</strong>: Average cost per task in USD, aggregated across benchmarks with reported cost</li>
             </ul>
             <p>
                 Individual benchmark pages provide:

                 The OpenHands Index Overall Leaderboard provides a high-level view of agent performance and efficiency:
             </p>
             <ul class="info-list">
+                <li><strong>Average score</strong>: A macro-average across all benchmarks (equal weighting)</li>
+                <li><strong>Total cost</strong>: Sum of costs across all categories, in USD</li>
             </ul>
             <p>
                 Individual benchmark pages provide:

content.py CHANGED Viewed

@@ -26,10 +26,10 @@ INTRO_PARAGRAPH = """
 <ul class="info-list">
     <li>
-        <strong>Overall score:</strong> A macro-average of the five category-level average scores. Each category contributes equally, regardless of how many benchmarks it includes. This ensures fair comparisons across agents with different domain strengths.
     </li>
     <li>
-        <strong>Overall cost:</strong> A macro-average of the agent’s cost per problem across all categories, in USD. Each category contributes equally.
     </li>
 </ul>

 <ul class="info-list">
     <li>
+        <strong>Average score:</strong> A macro-average of the five category-level average scores. Each category contributes equally, regardless of how many benchmarks it includes. This ensures fair comparisons across agents with different domain strengths.
     </li>
     <li>
+        <strong>Total cost:</strong> A macro-average of the agent’s cost per problem across all categories, in USD. Each category contributes equally.
     </li>
 </ul>