Commit History
remove all the with hint result d786aec
add llama-3.1 result fb96108
rename OpenDevin to OpenHands adf5af2
add 2nd run 455affb
--global commited on
add gpt-4o-mini result 3d1d4f1
Xingyao Wang commited on
Revert "add result from gpt-4o-mini" 12597ea
Xingyao Wang commited on
add result from gpt-4o-mini 3d406f5
Xingyao Wang commited on
update the last missing instance 1aaf82f
Xingyao Wang commited on
update result from pr2489 fc34a41
Xingyao Wang commited on
remove keys 77dbd55
Xingyao Wang commited on
revoke keys a34dfe3
Xingyao Wang commited on
add gpqa result 804693c
Xingyao Wang commited on
update v1.8 perf ec5bc65
Xingyao Wang commited on
add result for v1.8 no-hint gpt4o bd3dee6
Xingyao Wang commited on
add v1.8 result bb84cd4
Xingyao Wang commited on
update results using new ver of swebench 091b42e
Xingyao Wang commited on
add claude-3.5 result 1aa3b7d
Xingyao Wang commited on
update old result w/ swe-bench latest harness; 68dee1f
Xingyao Wang commited on
improved patch apply 9071da3
Xingyao Wang commited on
improved patch apply a4e8ae8
Xingyao Wang commited on
add report field 5abf617
Xingyao Wang commited on
Add CodeAct 1.6 no hint f47ed15 verified
add result for codeact 1.6 03f74db
Xingyao Wang commited on
add gpt-4-1106 results for codeact swe bb237c5
Xingyao Wang commited on
Merge commit 'edc3858a6ea5d0c7317b630024203af60e146b52' f55ef7f
Xingyao Wang commited on
update all swebench lite 78d8859
Xingyao Wang commited on
Update outputs/miniwob/README.md edc3858 verified
Update outputs/webarena/README.md c89a626 verified
Create README.md cfa8976 verified
Create README.md c323f7b verified
remove extra merged file 29a3904
Xingyao Wang commited on
add Mixtral 4731bca
Xingyao Wang commited on
update results for CodeActSWEAgent 81fb631
Xingyao Wang commited on
remove output merged for a new format 77b13b9
Xingyao Wang commited on
Delete outputs/webarena/BrowsingAgent/gpt-4o-2024-05-13_maxiter_15_N_v1.0/output.jsonl 7168c1c verified
Delete outputs/webarena/BrowsingAgent/gpt-3.5-turbo-0125_maxiter_15_N_v1.0/output.jsonl fe88798 verified
add webarena and miniwob results (#5) aa9fe42 verified
Add MINT results (#6) 764b1c5 verified
agentbench (#3) e7273a2 verified
humanevalfix (#4) 9535215 verified
Create visualization for MINT benchmark & upload results (#2) 054cb87 verified
update results fe6c7e5
Xingyao Wang commited on
add results for deepseek chat v2 126490f
Xingyao Wang commited on
add codeact swe agent 9b33edf
Xingyao Wang commited on
add gpt4o result for 1.5 5dbfa12
Xingyao Wang commited on
move data to swe_bench_lite 23df10d
Xingyao Wang commited on
rename dir 0d2d477
Xingyao Wang commited on
add result for deepseek f07fb3e
Xingyao Wang commited on