File size: 3,855 Bytes
87e9f6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7e6b58
87e9f6b
f7e6b58
87e9f6b
f7e6b58
87e9f6b
 
f7e6b58
 
 
 
 
 
 
 
 
 
 
87e9f6b
 
 
f7e6b58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87e9f6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# OpenHands Index Data Structure

This document describes the expected data structure for the `openhands-index-results` GitHub repository.

## Repository Structure

The data should be organized in the following structure:

```
openhands-index-results/
β”œβ”€β”€ 1.0.0-dev1/              # Version directory (matches CONFIG_NAME in config.py)
β”‚   β”œβ”€β”€ test.jsonl            # Test split results
β”‚   β”œβ”€β”€ validation.jsonl      # Validation split results
β”‚   β”œβ”€β”€ swe-bench.jsonl       # Individual benchmark results
β”‚   β”œβ”€β”€ swe-bench-multimodal.jsonl
β”‚   β”œβ”€β”€ swt-bench.jsonl
β”‚   β”œβ”€β”€ commit0.jsonl
β”‚   β”œβ”€β”€ gaia.jsonl
β”‚   └── agenteval.json        # Configuration file
```

## File Formats

### Agent Directory Structure

Each agent has its own directory containing two files:

**metadata.json** - Agent and model information:
```json
{
  "agent_name": "OpenHands CodeAct",
  "agent_version": "v1.8.3",
  "model": "claude-4.5-opus",
  "openness": "closed_api_available",
  "country": "us",
  "tool_usage": "standard",
  "submission_time": "2026-01-27T01:24:15.735789+00:00",
  "directory_name": "claude-4.5-opus",
  "release_date": "2025-11-24",
  "parameter_count_b": null,
  "active_parameter_count_b": null
}
```

**scores.json** - Array of benchmark results:
```json
[
  {
    "benchmark": "swe-bench",
    "score": 76.6,
    "metric": "accuracy",
    "cost_per_instance": 1.82,
    "average_runtime": 325.0,
    "full_archive": "https://results.eval.all-hands.dev/eval-21370451733-...",
    "tags": ["swe-bench"],
    "agent_version": "v1.8.3",
    "submission_time": "2026-01-27T01:24:15.735789+00:00"
  }
]
```

### Configuration File (agenteval.json)

The configuration file defines the benchmark structure:

```json
{
  "suite_config": {
    "name": "openhands-index",
    "version": "1.0.0-dev1",
    "splits": [
      {
        "name": "test",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      },
      {
        "name": "validation",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      }
    ]
  }
}
```

## Data Loading Process

1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository
2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`

## Updating Data

To update the leaderboard data:

1. Push new JSONL files to the `openhands-index-results` repository
2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
3. The app will automatically fetch the latest data on restart

## Mock Data

Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
- During development and testing
- When the GitHub repository is unavailable
- As a template for the expected data format