Spaces:
Running
Running
openhands openhands commited on
Commit ·
eb1e409
1
Parent(s): 4d0ae13
Update Frontend description to use SWE-bench Multimodal (Verified)
Browse files- Clarify that Frontend category uses the dev set of SWE-bench Multimodal
- Note that it uses only problems verified as solveable by human review
- Add links to the dataset and solveability annotations
Co-authored-by: openhands <openhands@all-hands.dev>
- content.py +1 -1
content.py
CHANGED
|
@@ -57,7 +57,7 @@ For detailed results, use the links above to explore individual benchmark pages.
|
|
| 57 |
FRONTEND_DEVELOPMENT_DESCRIPTION = """
|
| 58 |
The **Frontend** category evaluates agents on their ability to build user interfaces and web applications. This tests skills in HTML, CSS, JavaScript frameworks, responsive design, and creating interactive user experiences.
|
| 59 |
<br><br>
|
| 60 |
-
This category
|
| 61 |
<br>
|
| 62 |
"""
|
| 63 |
TEST_GENERATION_DESCRIPTION = """
|
|
|
|
| 57 |
FRONTEND_DEVELOPMENT_DESCRIPTION = """
|
| 58 |
The **Frontend** category evaluates agents on their ability to build user interfaces and web applications. This tests skills in HTML, CSS, JavaScript frameworks, responsive design, and creating interactive user experiences.
|
| 59 |
<br><br>
|
| 60 |
+
This category uses the dev set of <a href="https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal" target="_blank" rel="noopener noreferrer">SWE-bench Multimodal (Verified)</a>, a version of SWE-bench Multimodal that includes only <a href="https://github.com/OpenHands/benchmarks/blob/main/benchmarks/swebenchmultimodal/ambiguity_annotations.json" target="_blank" rel="noopener noreferrer">problems verified as solveable</a> by human review.
|
| 61 |
<br>
|
| 62 |
"""
|
| 63 |
TEST_GENERATION_DESCRIPTION = """
|