Spaces:
Running
Running
File size: 3,650 Bytes
87e9f6b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# OpenHands Index Data Structure
This document describes the expected data structure for the `openhands-index-results` GitHub repository.
## Repository Structure
The data should be organized in the following structure:
```
openhands-index-results/
βββ 1.0.0-dev1/ # Version directory (matches CONFIG_NAME in config.py)
β βββ test.jsonl # Test split results
β βββ validation.jsonl # Validation split results
β βββ swe-bench.jsonl # Individual benchmark results
β βββ multi-swe-bench.jsonl
β βββ swe-bench-multimodal.jsonl
β βββ swt-bench.jsonl
β βββ commit0.jsonl
β βββ gaia.jsonl
β βββ agenteval.json # Configuration file
```
## File Formats
### JSONL Files (test.jsonl, validation.jsonl, etc.)
Each line in a JSONL file should be a JSON object representing one agent's results:
```json
{
"Agent_Name": "OpenHands CodeAct v2.1",
"Llm_Base": "claude-3-5-sonnet-20241022",
"Openness": "closed_api_available",
"Tool_Usage": "standard",
"Score": 48.3,
"Metric": "resolve_rate",
"Submission_Time": "2025-11-24T19:56:00.092865",
"Tags": ["swe-bench"],
"Total_Cost": 34.15,
"Total_Runtime": 541.5
}
```
### Configuration File (agenteval.json)
The configuration file defines the benchmark structure:
```json
{
"suite_config": {
"name": "openhands-index",
"version": "1.0.0-dev1",
"splits": [
{
"name": "test",
"tasks": [
{
"name": "swe-bench",
"tags": ["swe-bench"]
},
{
"name": "multi-swe-bench",
"tags": ["multi-swe-bench"]
},
{
"name": "swe-bench-multimodal",
"tags": ["swe-bench-multimodal"]
},
{
"name": "swt-bench",
"tags": ["swt-bench"]
},
{
"name": "commit0",
"tags": ["commit0"]
},
{
"name": "gaia",
"tags": ["gaia"]
}
]
},
{
"name": "validation",
"tasks": [
{
"name": "swe-bench",
"tags": ["swe-bench"]
},
{
"name": "multi-swe-bench",
"tags": ["multi-swe-bench"]
},
{
"name": "swe-bench-multimodal",
"tags": ["swe-bench-multimodal"]
},
{
"name": "swt-bench",
"tags": ["swt-bench"]
},
{
"name": "commit0",
"tags": ["commit0"]
},
{
"name": "gaia",
"tags": ["gaia"]
}
]
}
]
}
}
```
## Data Loading Process
1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository
2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`
## Updating Data
To update the leaderboard data:
1. Push new JSONL files to the `openhands-index-results` repository
2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
3. The app will automatically fetch the latest data on restart
## Mock Data
Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
- During development and testing
- When the GitHub repository is unavailable
- As a template for the expected data format
|