Spaces:
Running
Running
| # OpenHands Index Data Structure | |
| This document describes the expected data structure for the `openhands-index-results` GitHub repository. | |
| ## Repository Structure | |
| The data should be organized in the following structure: | |
| ``` | |
| openhands-index-results/ | |
| βββ 1.0.0-dev1/ # Version directory (matches CONFIG_NAME in config.py) | |
| β βββ test.jsonl # Test split results | |
| β βββ validation.jsonl # Validation split results | |
| β βββ swe-bench.jsonl # Individual benchmark results | |
| β βββ swe-bench-multimodal.jsonl | |
| β βββ swt-bench.jsonl | |
| β βββ commit0.jsonl | |
| β βββ gaia.jsonl | |
| β βββ agenteval.json # Configuration file | |
| ``` | |
| ## File Formats | |
| ### Agent Directory Structure | |
| Each agent has its own directory containing two files: | |
| **metadata.json** - Agent and model information: | |
| ```json | |
| { | |
| "agent_name": "OpenHands CodeAct", | |
| "agent_version": "v1.8.3", | |
| "model": "claude-4.5-opus", | |
| "openness": "closed_api_available", | |
| "country": "us", | |
| "tool_usage": "standard", | |
| "submission_time": "2026-01-27T01:24:15.735789+00:00", | |
| "directory_name": "claude-4.5-opus", | |
| "release_date": "2025-11-24", | |
| "parameter_count_b": null, | |
| "active_parameter_count_b": null | |
| } | |
| ``` | |
| **scores.json** - Array of benchmark results: | |
| ```json | |
| [ | |
| { | |
| "benchmark": "swe-bench", | |
| "score": 76.6, | |
| "metric": "accuracy", | |
| "cost_per_instance": 1.82, | |
| "average_runtime": 325.0, | |
| "full_archive": "https://results.eval.all-hands.dev/eval-21370451733-...", | |
| "tags": ["swe-bench"], | |
| "agent_version": "v1.8.3", | |
| "submission_time": "2026-01-27T01:24:15.735789+00:00" | |
| } | |
| ] | |
| ``` | |
| ### Configuration File (agenteval.json) | |
| The configuration file defines the benchmark structure: | |
| ```json | |
| { | |
| "suite_config": { | |
| "name": "openhands-index", | |
| "version": "1.0.0-dev1", | |
| "splits": [ | |
| { | |
| "name": "test", | |
| "tasks": [ | |
| { | |
| "name": "swe-bench", | |
| "tags": ["swe-bench"] | |
| }, | |
| { | |
| "name": "swe-bench-multimodal", | |
| "tags": ["swe-bench-multimodal"] | |
| }, | |
| { | |
| "name": "swt-bench", | |
| "tags": ["swt-bench"] | |
| }, | |
| { | |
| "name": "commit0", | |
| "tags": ["commit0"] | |
| }, | |
| { | |
| "name": "gaia", | |
| "tags": ["gaia"] | |
| } | |
| ] | |
| }, | |
| { | |
| "name": "validation", | |
| "tasks": [ | |
| { | |
| "name": "swe-bench", | |
| "tags": ["swe-bench"] | |
| }, | |
| { | |
| "name": "swe-bench-multimodal", | |
| "tags": ["swe-bench-multimodal"] | |
| }, | |
| { | |
| "name": "swt-bench", | |
| "tags": ["swt-bench"] | |
| }, | |
| { | |
| "name": "commit0", | |
| "tags": ["commit0"] | |
| }, | |
| { | |
| "name": "gaia", | |
| "tags": ["gaia"] | |
| } | |
| ] | |
| } | |
| ] | |
| } | |
| } | |
| ``` | |
| ## Data Loading Process | |
| 1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository | |
| 2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1") | |
| 3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/` | |
| 4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/` | |
| ## Updating Data | |
| To update the leaderboard data: | |
| 1. Push new JSONL files to the `openhands-index-results` repository | |
| 2. Ensure the version directory matches `CONFIG_NAME` in `config.py` | |
| 3. The app will automatically fetch the latest data on restart | |
| ## Mock Data | |
| Mock data is stored in `mock_results/1.0.0-dev1/` and is used: | |
| - During development and testing | |
| - When the GitHub repository is unavailable | |
| - As a template for the expected data format | |