Spaces:

OpenHands
/

openhands-index

Running

openhands openhands commited on Jan 26

Commit

eb1e409

1 Parent(s): 4d0ae13

Update Frontend description to use SWE-bench Multimodal (Verified)

- Clarify that Frontend category uses the dev set of SWE-bench Multimodal
- Note that it uses only problems verified as solveable by human review
- Add links to the dataset and solveability annotations

Co-authored-by: openhands <openhands@all-hands.dev>

Files changed (1) hide show

content.py +1 -1

content.py CHANGED Viewed

@@ -57,7 +57,7 @@ For detailed results, use the links above to explore individual benchmark pages.
 FRONTEND_DEVELOPMENT_DESCRIPTION = """
 The **Frontend** category evaluates agents on their ability to build user interfaces and web applications. This tests skills in HTML, CSS, JavaScript frameworks, responsive design, and creating interactive user experiences.
 <br><br>
-This category includes SWE-bench Multimodal, which challenges agents to solve GitHub issues that include visual context like screenshots and diagrams.
 <br>
 """
 TEST_GENERATION_DESCRIPTION = """

 FRONTEND_DEVELOPMENT_DESCRIPTION = """
 The **Frontend** category evaluates agents on their ability to build user interfaces and web applications. This tests skills in HTML, CSS, JavaScript frameworks, responsive design, and creating interactive user experiences.
 <br><br>
+This category uses the dev set of <a href="https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal" target="_blank" rel="noopener noreferrer">SWE-bench Multimodal (Verified)</a>, a version of SWE-bench Multimodal that includes only <a href="https://github.com/OpenHands/benchmarks/blob/main/benchmarks/swebenchmultimodal/ambiguity_annotations.json" target="_blank" rel="noopener noreferrer">problems verified as solveable</a> by human review.
 <br>
 """
 TEST_GENERATION_DESCRIPTION = """