Update Frontend description to use SWE-bench Multimodal (Verified)
eb1e409
openhandsopenhandscommited on
Move Download column to benchmark-specific tables only
4d0ae13
openhandsopenhandscommited on
Add Download column for trajectory archives and increase table font size
b5317d7
openhandsopenhandscommited on
Fix Pareto frontier calculation and display
1739efc
openhandsopenhandscommited on
Fix: Treat zero scores as valid results, not missing categories
eff7f34
openhandsopenhandscommited on
Reduce cache TTL from 1 hour to 15 minutes for faster data refresh
da96293
openhandsopenhandscommited on
Remove multi-swe-bench from OpenHands Index
b978a6b
openhandscommited on
Add timer-based auto-refresh for leaderboard data
974f31f
openhandscommited on
Move 'Show incomplete entries' checkbox above plot and apply filter to both
361b5c2
openhandscommited on
Add periodic cache refresh for leaderboard data
6bddf26
openhandscommited on
Change blue links and text to dark grey (#555555)
d68f190
openhandsopenhandscommited on
Remove illustrative figures from category sub-pages
d2c3c09
openhandsopenhandscommited on
Update DeepSeek logo, tooltip format, and category names
5778893
openhandsopenhandscommited on
Fix frontier labels to use log10 coordinates for log scale axis
8cdce51
openhandsopenhandscommited on
Method C: Use domain coordinates for layout images with log scale
4a4a7b5
openhandsopenhandscommited on
Method B: Use raw data values for layout images (revert log transformation)
46ff970
openhandsopenhandscommited on
Method A: Fix logo positions for log scale x-axis
7b4a3a1
openhandsopenhandscommited on
Enable log scale for x-axis (cost) in graphs
5d32e7b
openhandsopenhandscommited on
Trigger rebuild
8d4cfda
openhandscommited on
Update intro text to focus on motivation rather than metrics
369c590
openhandsopenhandscommited on
Fix table icons layout and add Qwen/MiniMax logos
72b86cb
openhandsopenhandscommited on
UI cleanup and About page updates
6737ff3
openhandsopenhandscommited on
Fix graph alignment issues
af81bcf
openhandsopenhandscommited on
Multiple graph and table improvements
fcb3d0b
openhandsopenhandscommited on
Replace open/closed model distinction with lock emojis in tables
8a3a9eb
openhandsopenhandscommited on
Remove open/closed distinction from graph, use company logos as data points
b6ec318
openhandsopenhandscommited on
Add company logos to graphs and tables, label frontier points with model names
800e404
openhandsopenhandscommited on
Replace total_cost with cost_per_instance (average cost per instance)
b1f3e49
openhandsopenhandscommited on
Fix TypeError when summing costs with None values
cdd40ba
openhandsopenhandscommited on
Remove Test Set/Validation Set tabs, keep single results view
b8aea20
openhandsopenhandscommited on
fix: Update Total cost description in intro to be sum, not average
6a5c447
openhandsopenhandscommited on
docs: Update descriptive text to use Average Score and Total Cost
bb0f7af
openhandsopenhandscommited on
fix: Column naming and incomplete entries toggle
4ab5f97
openhandsopenhandscommited on
feat: Update leaderboard calculations and add incomplete entries toggle
5998027
openhandsopenhandscommited on
feat: Add open_weights to openness mapping
55da48c
openhandsopenhandscommited on
feat: Use pydantic schema models from openhands-index-results for validation
f0339f3
openhandsopenhandscommited on
Fix: Handle pd.NA values in calculate_attempted function
28554f6
openhandsopenhandscommited on
Update fallback category mappings: place SWE-Bench Multimodal under 'Frontend Development' and Swt-Bench under 'Test Generation'.\n\nCo-authored-by: openhands <openhands@all-hands.dev>
b42a4fe
openhandscommited on
Move commit0 to 'App Creation' category in fallback mappings.\n\nCo-authored-by: openhands <openhands@all-hands.dev>
b16f7da
openhandscommited on
Fix UI score formatting: do not coerce NaN to 0; rely on format_score_column to show 'Not Submitted'.\n\nCo-authored-by: openhands <openhands@all-hands.dev>
c68aa7d
openhandscommited on
Fix score formatting to avoid coercing NaN to 0; show 'Not Submitted' instead.\n\nCo-authored-by: openhands <openhands@all-hands.dev>
5d82fab
openhandscommited on
Fix data plotting requirements and server port handling; ensure per-benchmark plots use correct agent column.\n\n- Respect HOST/PORT env for local runs\n- Use 'OpenHands Version' in plot requirements\n- Avoid plotting when use_plotly=False\n\nCo-authored-by: openhands <openhands@all-hands.dev>
fb3d0db
openhandscommited on
CRITICAL FIX: Add fallback category mappings for data without agenteval.json
b4ac443
openhandsopenhandscommited on
Add debug logging to track data loading on HuggingFace Space
044cdf4
openhandsopenhandscommited on
Force rebuild: Trigger HuggingFace Space to fetch latest data from GitHub
8be216f
openhandsopenhandscommited on
Fix Categories Attempted calculation to handle missing category columns correctly
0718569
openhandscommited on
Remove unused AstaBench category files and update UI to OpenHands categories
6a0d1cb
openhandscommited on
Add Acknowledgements section crediting AstaBench
737a3f2
openhandscommited on
Fix score calculation to match AstaBench methodology and update categories