Commit History
fix visualizer with latest streamlit feature 248fd06
add 2nd run 455affb
--global commited on
add gpt-4o-mini result 3d1d4f1
Xingyao Wang commited on
Revert "add result from gpt-4o-mini" 12597ea
Xingyao Wang commited on
add result from gpt-4o-mini 3d406f5
Xingyao Wang commited on
update the last missing instance 1aaf82f
Xingyao Wang commited on
update result from pr2489 fc34a41
Xingyao Wang commited on
remove keys 77dbd55
Xingyao Wang commited on
revoke keys a34dfe3
Xingyao Wang commited on
add gpqa result 804693c
Xingyao Wang commited on
update v1.8 perf ec5bc65
Xingyao Wang commited on
add result for v1.8 no-hint gpt4o bd3dee6
Xingyao Wang commited on
fix model_name in updated metadat df68ce0
Xingyao Wang commited on
add v1.8 result bb84cd4
Xingyao Wang commited on
update results using new ver of swebench 091b42e
Xingyao Wang commited on
set n error/stuck/cost to 0 for CodeAct exp run below v1.5 d2b6426
Xingyao Wang commited on
by default not showing with hint result ba8f82b
Xingyao Wang commited on
add claude-3.5 result 1aa3b7d
Xingyao Wang commited on
support loading report with new format e2ddd17
Xingyao Wang commited on
update gitignore 98bdf36
Xingyao Wang commited on
update old result w/ swe-bench latest harness; 68dee1f
Xingyao Wang commited on
improved patch apply 9071da3
Xingyao Wang commited on
improved patch apply a4e8ae8
Xingyao Wang commited on
add report field 5abf617
Xingyao Wang commited on
Add CodeAct 1.6 no hint f47ed15 verified
fix visualizer 913979f
Xingyao Wang commited on
fix visualizer to only display eval_report when it exists a4c5e33
Xingyao Wang commited on
add result for codeact 1.6 03f74db
Xingyao Wang commited on
only show swe bench on visualizer 705a1e5
Xingyao Wang commited on
change test_result to bool 1ae8615
Xingyao Wang commited on
fix fine-grained report; support visualization while running 7eb2653
Xingyao Wang commited on
add gpt-4-1106 results for codeact swe bb237c5
Xingyao Wang commited on
Merge commit 'edc3858a6ea5d0c7317b630024203af60e146b52' f55ef7f
Xingyao Wang commited on
update all swebench lite 78d8859
Xingyao Wang commited on
Update outputs/miniwob/README.md edc3858 verified
Update outputs/webarena/README.md c89a626 verified
Create README.md cfa8976 verified
Create README.md c323f7b verified
remove extra merged file 29a3904
Xingyao Wang commited on
add Mixtral 4731bca
Xingyao Wang commited on
support visualization of new swebench-eval 414a759
Xingyao Wang commited on
update results for CodeActSWEAgent 81fb631
Xingyao Wang commited on
remove output merged for a new format 77b13b9
Xingyao Wang commited on