Spaces:
Running
Running
openhands
openhands
commited on
Commit
·
bb0f7af
1
Parent(s):
4ab5f97
docs: Update descriptive text to use Average Score and Total Cost
Browse files- Changed 'Overall score' to 'Average score' in content.py and about.py
- Changed 'Overall cost' to 'Total cost' in content.py and about.py
- Updated Total cost description to reflect that it's a sum, not an average
Co-authored-by: openhands <openhands@all-hands.dev>
- about.py +2 -2
- content.py +2 -2
about.py
CHANGED
|
@@ -62,8 +62,8 @@ def build_page():
|
|
| 62 |
The OpenHands Index Overall Leaderboard provides a high-level view of agent performance and efficiency:
|
| 63 |
</p>
|
| 64 |
<ul class="info-list">
|
| 65 |
-
<li><strong>
|
| 66 |
-
<li><strong>
|
| 67 |
</ul>
|
| 68 |
<p>
|
| 69 |
Individual benchmark pages provide:
|
|
|
|
| 62 |
The OpenHands Index Overall Leaderboard provides a high-level view of agent performance and efficiency:
|
| 63 |
</p>
|
| 64 |
<ul class="info-list">
|
| 65 |
+
<li><strong>Average score</strong>: A macro-average across all benchmarks (equal weighting)</li>
|
| 66 |
+
<li><strong>Total cost</strong>: Sum of costs across all categories, in USD</li>
|
| 67 |
</ul>
|
| 68 |
<p>
|
| 69 |
Individual benchmark pages provide:
|
content.py
CHANGED
|
@@ -26,10 +26,10 @@ INTRO_PARAGRAPH = """
|
|
| 26 |
|
| 27 |
<ul class="info-list">
|
| 28 |
<li>
|
| 29 |
-
<strong>
|
| 30 |
</li>
|
| 31 |
<li>
|
| 32 |
-
<strong>
|
| 33 |
</li>
|
| 34 |
</ul>
|
| 35 |
|
|
|
|
| 26 |
|
| 27 |
<ul class="info-list">
|
| 28 |
<li>
|
| 29 |
+
<strong>Average score:</strong> A macro-average of the five category-level average scores. Each category contributes equally, regardless of how many benchmarks it includes. This ensures fair comparisons across agents with different domain strengths.
|
| 30 |
</li>
|
| 31 |
<li>
|
| 32 |
+
<strong>Total cost:</strong> A macro-average of the agent’s cost per problem across all categories, in USD. Each category contributes equally.
|
| 33 |
</li>
|
| 34 |
</ul>
|
| 35 |
|