Spaces:

OpenHands
/

openhands-index

Running

App Files Files Community

openhands-index / DATA_STRUCTURE.md

openhands

Add comprehensive error handling and timeout protection

87e9f6b 20 days ago

preview code

raw

history blame contribute delete

3.65 kB

OpenHands Index Data Structure

This document describes the expected data structure for the openhands-index-results GitHub repository.

Repository Structure

The data should be organized in the following structure:

openhands-index-results/
├── 1.0.0-dev1/              # Version directory (matches CONFIG_NAME in config.py)
│   ├── test.jsonl            # Test split results
│   ├── validation.jsonl      # Validation split results
│   ├── swe-bench.jsonl       # Individual benchmark results
│   ├── multi-swe-bench.jsonl
│   ├── swe-bench-multimodal.jsonl
│   ├── swt-bench.jsonl
│   ├── commit0.jsonl
│   ├── gaia.jsonl
│   └── agenteval.json        # Configuration file

File Formats

JSONL Files (test.jsonl, validation.jsonl, etc.)

Each line in a JSONL file should be a JSON object representing one agent's results:

{
  "Agent_Name": "OpenHands CodeAct v2.1",
  "Llm_Base": "claude-3-5-sonnet-20241022",
  "Openness": "closed_api_available",
  "Tool_Usage": "standard",
  "Score": 48.3,
  "Metric": "resolve_rate",
  "Submission_Time": "2025-11-24T19:56:00.092865",
  "Tags": ["swe-bench"],
  "Total_Cost": 34.15,
  "Total_Runtime": 541.5
}

Configuration File (agenteval.json)

The configuration file defines the benchmark structure:

{
  "suite_config": {
    "name": "openhands-index",
    "version": "1.0.0-dev1",
    "splits": [
      {
        "name": "test",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "multi-swe-bench",
            "tags": ["multi-swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      },
      {
        "name": "validation",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "multi-swe-bench",
            "tags": ["multi-swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      }
    ]
  }
}

Data Loading Process

GitHub Repository Check: The app first attempts to clone the openhands-index-results repository
Version Directory: Looks for a directory matching CONFIG_NAME (currently "1.0.0-dev1")
Fallback to Mock Data: If GitHub data is unavailable, falls back to local mock data in mock_results/
Data Extraction: Copies data to /tmp/oh_index/data/{version}/extracted/{version}/

Updating Data

To update the leaderboard data:

Push new JSONL files to the openhands-index-results repository
Ensure the version directory matches CONFIG_NAME in config.py
The app will automatically fetch the latest data on restart

Mock Data

Mock data is stored in mock_results/1.0.0-dev1/ and is used:

During development and testing
When the GitHub repository is unavailable
As a template for the expected data format