openhands openhands commited on
Commit
eb1e409
·
1 Parent(s): 4d0ae13

Update Frontend description to use SWE-bench Multimodal (Verified)

Browse files

- Clarify that Frontend category uses the dev set of SWE-bench Multimodal
- Note that it uses only problems verified as solveable by human review
- Add links to the dataset and solveability annotations

Co-authored-by: openhands <openhands@all-hands.dev>

Files changed (1) hide show
  1. content.py +1 -1
content.py CHANGED
@@ -57,7 +57,7 @@ For detailed results, use the links above to explore individual benchmark pages.
57
  FRONTEND_DEVELOPMENT_DESCRIPTION = """
58
  The **Frontend** category evaluates agents on their ability to build user interfaces and web applications. This tests skills in HTML, CSS, JavaScript frameworks, responsive design, and creating interactive user experiences.
59
  <br><br>
60
- This category includes SWE-bench Multimodal, which challenges agents to solve GitHub issues that include visual context like screenshots and diagrams.
61
  <br>
62
  """
63
  TEST_GENERATION_DESCRIPTION = """
 
57
  FRONTEND_DEVELOPMENT_DESCRIPTION = """
58
  The **Frontend** category evaluates agents on their ability to build user interfaces and web applications. This tests skills in HTML, CSS, JavaScript frameworks, responsive design, and creating interactive user experiences.
59
  <br><br>
60
+ This category uses the dev set of <a href="https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal" target="_blank" rel="noopener noreferrer">SWE-bench Multimodal (Verified)</a>, a version of SWE-bench Multimodal that includes only <a href="https://github.com/OpenHands/benchmarks/blob/main/benchmarks/swebenchmultimodal/ambiguity_annotations.json" target="_blank" rel="noopener noreferrer">problems verified as solveable</a> by human review.
61
  <br>
62
  """
63
  TEST_GENERATION_DESCRIPTION = """