File size: 3,650 Bytes
87e9f6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# OpenHands Index Data Structure

This document describes the expected data structure for the `openhands-index-results` GitHub repository.

## Repository Structure

The data should be organized in the following structure:

```
openhands-index-results/
β”œβ”€β”€ 1.0.0-dev1/              # Version directory (matches CONFIG_NAME in config.py)
β”‚   β”œβ”€β”€ test.jsonl            # Test split results
β”‚   β”œβ”€β”€ validation.jsonl      # Validation split results
β”‚   β”œβ”€β”€ swe-bench.jsonl       # Individual benchmark results
β”‚   β”œβ”€β”€ multi-swe-bench.jsonl
β”‚   β”œβ”€β”€ swe-bench-multimodal.jsonl
β”‚   β”œβ”€β”€ swt-bench.jsonl
β”‚   β”œβ”€β”€ commit0.jsonl
β”‚   β”œβ”€β”€ gaia.jsonl
β”‚   └── agenteval.json        # Configuration file
```

## File Formats

### JSONL Files (test.jsonl, validation.jsonl, etc.)

Each line in a JSONL file should be a JSON object representing one agent's results:

```json
{
  "Agent_Name": "OpenHands CodeAct v2.1",
  "Llm_Base": "claude-3-5-sonnet-20241022",
  "Openness": "closed_api_available",
  "Tool_Usage": "standard",
  "Score": 48.3,
  "Metric": "resolve_rate",
  "Submission_Time": "2025-11-24T19:56:00.092865",
  "Tags": ["swe-bench"],
  "Total_Cost": 34.15,
  "Total_Runtime": 541.5
}
```

### Configuration File (agenteval.json)

The configuration file defines the benchmark structure:

```json
{
  "suite_config": {
    "name": "openhands-index",
    "version": "1.0.0-dev1",
    "splits": [
      {
        "name": "test",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "multi-swe-bench",
            "tags": ["multi-swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      },
      {
        "name": "validation",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "multi-swe-bench",
            "tags": ["multi-swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      }
    ]
  }
}
```

## Data Loading Process

1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository
2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`

## Updating Data

To update the leaderboard data:

1. Push new JSONL files to the `openhands-index-results` repository
2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
3. The app will automatically fetch the latest data on restart

## Mock Data

Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
- During development and testing
- When the GitHub repository is unavailable
- As a template for the expected data format