File size: 3,404 Bytes
87e9f6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# OpenHands Index Data Structure

This document describes the expected data structure for the `openhands-index-results` GitHub repository.

## Repository Structure

The data should be organized in the following structure:

```
openhands-index-results/
β”œβ”€β”€ 1.0.0-dev1/              # Version directory (matches CONFIG_NAME in config.py)
β”‚   β”œβ”€β”€ test.jsonl            # Test split results
β”‚   β”œβ”€β”€ validation.jsonl      # Validation split results
β”‚   β”œβ”€β”€ swe-bench.jsonl       # Individual benchmark results
β”‚   β”œβ”€β”€ swe-bench-multimodal.jsonl
β”‚   β”œβ”€β”€ swt-bench.jsonl
β”‚   β”œβ”€β”€ commit0.jsonl
β”‚   β”œβ”€β”€ gaia.jsonl
β”‚   └── agenteval.json        # Configuration file
```

## File Formats

### JSONL Files (test.jsonl, validation.jsonl, etc.)

Each line in a JSONL file should be a JSON object representing one agent's results:

```json
{
  "Agent_Name": "OpenHands CodeAct v2.1",
  "Llm_Base": "claude-3-5-sonnet-20241022",
  "Openness": "closed_api_available",
  "Tool_Usage": "standard",
  "Score": 48.3,
  "Metric": "resolve_rate",
  "Submission_Time": "2025-11-24T19:56:00.092865",
  "Tags": ["swe-bench"],
  "Total_Cost": 34.15,
  "Total_Runtime": 541.5
}
```

### Configuration File (agenteval.json)

The configuration file defines the benchmark structure:

```json
{
  "suite_config": {
    "name": "openhands-index",
    "version": "1.0.0-dev1",
    "splits": [
      {
        "name": "test",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      },
      {
        "name": "validation",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      }
    ]
  }
}
```

## Data Loading Process

1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository
2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`

## Updating Data

To update the leaderboard data:

1. Push new JSONL files to the `openhands-index-results` repository
2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
3. The app will automatically fetch the latest data on restart

## Mock Data

Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
- During development and testing
- When the GitHub repository is unavailable
- As a template for the expected data format