Spaces:

OpenHands
/

openhands-index

Running

App Files Files Community

openhands-index

Commit History

Empty PR to test CI

da83a25

openhands openhands commited on Jan 26

Update Frontend description to use SWE-bench Multimodal (Verified)

eb1e409

openhands openhands commited on Jan 26

Move Download column to benchmark-specific tables only

4d0ae13

openhands openhands commited on Jan 22

Add Download column for trajectory archives and increase table font size

b5317d7

openhands openhands commited on Jan 22

Fix Pareto frontier calculation and display

1739efc

openhands openhands commited on Jan 22

Fix: Treat zero scores as valid results, not missing categories

eff7f34

openhands openhands commited on Jan 22

Reduce cache TTL from 1 hour to 15 minutes for faster data refresh

da96293

openhands openhands commited on Jan 22

Remove multi-swe-bench from OpenHands Index

b978a6b

openhands commited on Jan 21

Add timer-based auto-refresh for leaderboard data

974f31f

openhands commited on Jan 21

Move 'Show incomplete entries' checkbox above plot and apply filter to both

361b5c2

openhands commited on Jan 20

Add periodic cache refresh for leaderboard data

6bddf26

openhands commited on Jan 20

Change blue links and text to dark grey (#555555)

d68f190

openhands openhands commited on Jan 18

Remove illustrative figures from category sub-pages

d2c3c09

openhands openhands commited on Jan 18

Update DeepSeek logo, tooltip format, and category names

5778893

openhands openhands commited on Jan 18

Fix frontier labels to use log10 coordinates for log scale axis

8cdce51

openhands openhands commited on Jan 18

Method C: Use domain coordinates for layout images with log scale

4a4a7b5

openhands openhands commited on Jan 18

Method B: Use raw data values for layout images (revert log transformation)

46ff970

openhands openhands commited on Jan 18

Method A: Fix logo positions for log scale x-axis

7b4a3a1

openhands openhands commited on Jan 18

Enable log scale for x-axis (cost) in graphs

5d32e7b

openhands openhands commited on Jan 18

Trigger rebuild

8d4cfda

openhands commited on Jan 18

Update intro text to focus on motivation rather than metrics

369c590

openhands openhands commited on Jan 18

Fix table icons layout and add Qwen/MiniMax logos

72b86cb

openhands openhands commited on Jan 18

UI cleanup and About page updates

6737ff3

openhands openhands commited on Jan 18

Fix graph alignment issues

af81bcf

openhands openhands commited on Jan 18

Multiple graph and table improvements

fcb3d0b

openhands openhands commited on Jan 18

Replace open/closed model distinction with lock emojis in tables

8a3a9eb

openhands openhands commited on Jan 18

Remove open/closed distinction from graph, use company logos as data points

b6ec318

openhands openhands commited on Jan 18

Add company logos to graphs and tables, label frontier points with model names

800e404

openhands openhands commited on Jan 18

Replace total_cost with cost_per_instance (average cost per instance)

b1f3e49

openhands openhands commited on Jan 17

Fix TypeError when summing costs with None values

cdd40ba

openhands openhands commited on Jan 17

Remove Test Set/Validation Set tabs, keep single results view

b8aea20

openhands openhands commited on Jan 17

fix: Update Total cost description in intro to be sum, not average

6a5c447

openhands openhands commited on Jan 16

docs: Update descriptive text to use Average Score and Total Cost

bb0f7af

openhands openhands commited on Jan 16

fix: Column naming and incomplete entries toggle

4ab5f97

openhands openhands commited on Jan 16

feat: Update leaderboard calculations and add incomplete entries toggle

5998027

openhands openhands commited on Jan 16

feat: Add open_weights to openness mapping

55da48c

openhands openhands commited on Jan 16

feat: Use pydantic schema models from openhands-index-results for validation

f0339f3

openhands openhands commited on Jan 16

Fix: Handle pd.NA values in calculate_attempted function

28554f6

openhands openhands commited on Jan 16

Update fallback category mappings: place SWE-Bench Multimodal under 'Frontend Development' and Swt-Bench under 'Test Generation'.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

b42a4fe

openhands commited on Nov 25, 2025

Move commit0 to 'App Creation' category in fallback mappings.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

b16f7da

openhands commited on Nov 25, 2025

Fix UI score formatting: do not coerce NaN to 0; rely on format_score_column to show 'Not Submitted'.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

c68aa7d

openhands commited on Nov 25, 2025

Fix score formatting to avoid coercing NaN to 0; show 'Not Submitted' instead.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

5d82fab

openhands commited on Nov 25, 2025

Fix data plotting requirements and server port handling; ensure per-benchmark plots use correct agent column.\n\n- Respect HOST/PORT env for local runs\n- Use 'OpenHands Version' in plot requirements\n- Avoid plotting when use_plotly=False\n\nCo-authored-by: openhands <openhands@all-hands.dev>

fb3d0db

openhands commited on Nov 25, 2025

CRITICAL FIX: Add fallback category mappings for data without agenteval.json

b4ac443

openhands openhands commited on Nov 25, 2025

Add debug logging to track data loading on HuggingFace Space

044cdf4

openhands openhands commited on Nov 25, 2025

Force rebuild: Trigger HuggingFace Space to fetch latest data from GitHub

8be216f

openhands openhands commited on Nov 25, 2025

Fix Categories Attempted calculation to handle missing category columns correctly

0718569

openhands commited on Nov 25, 2025

Remove unused AstaBench category files and update UI to OpenHands categories

6a0d1cb

openhands commited on Nov 25, 2025

Add Acknowledgements section crediting AstaBench

737a3f2

openhands commited on Nov 25, 2025

Fix score calculation to match AstaBench methodology and update categories

e734bf6

openhands commited on Nov 25, 2025