Spaces:
Running
Running
Commit History
Date in summary table is missing (#10) 7949205
fix: Add Visualization column to main table (not just benchmark tables) 5f628a6
openhands openhands commited on
feat: Add Visualization column for Laminar eval links dfa8bfc
openhands openhands commited on
Skip entries with hide_from_leaderboard=True 2123552
openhands openhands commited on
Remove agenteval.json, use hardcoded mappings as single source of truth 24ccc42
openhands commited on
Add Evolution Over Time and Open Model Accuracy by Size visualizations a4b9436
openhands openhands commited on
Add runtime column and Cost/Performance + Runtime/Performance charts to all pages 2854ddd
openhands openhands commited on
Move Download column to benchmark-specific tables only 4d0ae13
openhands openhands commited on
Add Download column for trajectory archives and increase table font size b5317d7
openhands openhands commited on
Fix Pareto frontier calculation and display 1739efc
openhands openhands commited on
Remove multi-swe-bench from OpenHands Index b978a6b
openhands commited on
Update DeepSeek logo, tooltip format, and category names 5778893
openhands openhands commited on
Replace total_cost with cost_per_instance (average cost per instance) b1f3e49
openhands openhands commited on
Fix TypeError when summing costs with None values cdd40ba
openhands openhands commited on
feat: Update leaderboard calculations and add incomplete entries toggle 5998027
openhands openhands commited on
feat: Use pydantic schema models from openhands-index-results for validation f0339f3
openhands openhands commited on
Update fallback category mappings: place SWE-Bench Multimodal under 'Frontend Development' and Swt-Bench under 'Test Generation'.\n\nCo-authored-by: openhands <openhands@all-hands.dev> b42a4fe
openhands commited on
Move commit0 to 'App Creation' category in fallback mappings.\n\nCo-authored-by: openhands <openhands@all-hands.dev> b16f7da
openhands commited on
CRITICAL FIX: Add fallback category mappings for data without agenteval.json b4ac443
openhands openhands commited on
Add debug logging to track data loading on HuggingFace Space 044cdf4
openhands openhands commited on
Fix score calculation to match AstaBench methodology and update categories e734bf6
openhands commited on
Fix agent_version display and make Overall Score bold 0e14c25
openhands commited on
Rename columns: Agent→OpenHands Version, Models Used→Language Model, remove Submitter 376500e
openhands openhands commited on
Cleanup codebase: remove unused code, simplify data loading, and add pre-release notice 855423e
openhands openhands commited on
Simplify leaderboard to open vs closed models only ca754bb
openhands commited on
Update data loader to support agent-centric directory structure e003f7b
openhands openhands commited on
Update UI: All-Hands-AI color scheme, agent version column names, and OpenHands logo 0ee2099
openhands openhands commited on
Fix category cost calculation - add category-level aggregation 5e9c3b9
openhands openhands commited on
Fix null columns by mapping source data to expected column names aa07520
openhands commited on
Convert to JSONL data format and remove agent-eval dependency 1027cfb
openhands openhands commited on