Spaces:
Running
Running
File size: 3,404 Bytes
87e9f6b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | # OpenHands Index Data Structure
This document describes the expected data structure for the `openhands-index-results` GitHub repository.
## Repository Structure
The data should be organized in the following structure:
```
openhands-index-results/
βββ 1.0.0-dev1/ # Version directory (matches CONFIG_NAME in config.py)
β βββ test.jsonl # Test split results
β βββ validation.jsonl # Validation split results
β βββ swe-bench.jsonl # Individual benchmark results
β βββ swe-bench-multimodal.jsonl
β βββ swt-bench.jsonl
β βββ commit0.jsonl
β βββ gaia.jsonl
β βββ agenteval.json # Configuration file
```
## File Formats
### JSONL Files (test.jsonl, validation.jsonl, etc.)
Each line in a JSONL file should be a JSON object representing one agent's results:
```json
{
"Agent_Name": "OpenHands CodeAct v2.1",
"Llm_Base": "claude-3-5-sonnet-20241022",
"Openness": "closed_api_available",
"Tool_Usage": "standard",
"Score": 48.3,
"Metric": "resolve_rate",
"Submission_Time": "2025-11-24T19:56:00.092865",
"Tags": ["swe-bench"],
"Total_Cost": 34.15,
"Total_Runtime": 541.5
}
```
### Configuration File (agenteval.json)
The configuration file defines the benchmark structure:
```json
{
"suite_config": {
"name": "openhands-index",
"version": "1.0.0-dev1",
"splits": [
{
"name": "test",
"tasks": [
{
"name": "swe-bench",
"tags": ["swe-bench"]
},
{
"name": "swe-bench-multimodal",
"tags": ["swe-bench-multimodal"]
},
{
"name": "swt-bench",
"tags": ["swt-bench"]
},
{
"name": "commit0",
"tags": ["commit0"]
},
{
"name": "gaia",
"tags": ["gaia"]
}
]
},
{
"name": "validation",
"tasks": [
{
"name": "swe-bench",
"tags": ["swe-bench"]
},
{
"name": "swe-bench-multimodal",
"tags": ["swe-bench-multimodal"]
},
{
"name": "swt-bench",
"tags": ["swt-bench"]
},
{
"name": "commit0",
"tags": ["commit0"]
},
{
"name": "gaia",
"tags": ["gaia"]
}
]
}
]
}
}
```
## Data Loading Process
1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository
2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`
## Updating Data
To update the leaderboard data:
1. Push new JSONL files to the `openhands-index-results` repository
2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
3. The app will automatically fetch the latest data on restart
## Mock Data
Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
- During development and testing
- When the GitHub repository is unavailable
- As a template for the expected data format
|