Spaces:

OpenHands
/

openhands-index

Running

App Files Files Community

openhands-index / DATA_STRUCTURE.md

openhands

Update mock data and docs to match new openhands-index-results format

f7e6b58 4 months ago

preview code

raw

history blame

3.86 kB

	# OpenHands Index Data Structure

	This document describes the expected data structure for the `openhands-index-results` GitHub repository.

	## Repository Structure

	The data should be organized in the following structure:

	```
	openhands-index-results/
	├── 1.0.0-dev1/ # Version directory (matches CONFIG_NAME in config.py)
	│ ├── test.jsonl # Test split results
	│ ├── validation.jsonl # Validation split results
	│ ├── swe-bench.jsonl # Individual benchmark results
	│ ├── swe-bench-multimodal.jsonl
	│ ├── swt-bench.jsonl
	│ ├── commit0.jsonl
	│ ├── gaia.jsonl
	│ └── agenteval.json # Configuration file
	```

	## File Formats

	### Agent Directory Structure

	Each agent has its own directory containing two files:

	metadata.json - Agent and model information:
	```json
	{
	"agent_name": "OpenHands CodeAct",
	"agent_version": "v1.8.3",
	"model": "claude-4.5-opus",
	"openness": "closed_api_available",
	"country": "us",
	"tool_usage": "standard",
	"submission_time": "2026-01-27T01:24:15.735789+00:00",
	"directory_name": "claude-4.5-opus",
	"release_date": "2025-11-24",
	"parameter_count_b": null,
	"active_parameter_count_b": null
	}
	```

	scores.json - Array of benchmark results:
	```json
	[
	{
	"benchmark": "swe-bench",
	"score": 76.6,
	"metric": "accuracy",
	"cost_per_instance": 1.82,
	"average_runtime": 325.0,
	"full_archive": "https://results.eval.all-hands.dev/eval-21370451733-...",
	"tags": ["swe-bench"],
	"agent_version": "v1.8.3",
	"submission_time": "2026-01-27T01:24:15.735789+00:00"
	}
	]
	```

	### Configuration File (agenteval.json)

	The configuration file defines the benchmark structure:

	```json
	{
	"suite_config": {
	"name": "openhands-index",
	"version": "1.0.0-dev1",
	"splits": [
	{
	"name": "test",
	"tasks": [
	{
	"name": "swe-bench",
	"tags": ["swe-bench"]
	},
	{
	"name": "swe-bench-multimodal",
	"tags": ["swe-bench-multimodal"]
	},
	{
	"name": "swt-bench",
	"tags": ["swt-bench"]
	},
	{
	"name": "commit0",
	"tags": ["commit0"]
	},
	{
	"name": "gaia",
	"tags": ["gaia"]
	}
	]
	},
	{
	"name": "validation",
	"tasks": [
	{
	"name": "swe-bench",
	"tags": ["swe-bench"]
	},
	{
	"name": "swe-bench-multimodal",
	"tags": ["swe-bench-multimodal"]
	},
	{
	"name": "swt-bench",
	"tags": ["swt-bench"]
	},
	{
	"name": "commit0",
	"tags": ["commit0"]
	},
	{
	"name": "gaia",
	"tags": ["gaia"]
	}
	]
	}
	]
	}
	}
	```

	## Data Loading Process

	1. GitHub Repository Check: The app first attempts to clone the `openhands-index-results` repository
	2. Version Directory: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
	3. Fallback to Mock Data: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
	4. Data Extraction: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`

	## Updating Data

	To update the leaderboard data:

	1. Push new JSONL files to the `openhands-index-results` repository
	2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
	3. The app will automatically fetch the latest data on restart

	## Mock Data

	Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
	- During development and testing
	- When the GitHub repository is unavailable
	- As a template for the expected data format