Commit History
fix visualizer 913979f
Xingyao Wang commited on
fix visualizer to only display eval_report when it exists a4c5e33
Xingyao Wang commited on
add result for codeact 1.6 03f74db
Xingyao Wang commited on
only show swe bench on visualizer 705a1e5
Xingyao Wang commited on
change test_result to bool 1ae8615
Xingyao Wang commited on
fix fine-grained report; support visualization while running 7eb2653
Xingyao Wang commited on
add gpt-4-1106 results for codeact swe bb237c5
Xingyao Wang commited on
Merge commit 'edc3858a6ea5d0c7317b630024203af60e146b52' f55ef7f
Xingyao Wang commited on
update all swebench lite 78d8859
Xingyao Wang commited on
Update outputs/miniwob/README.md edc3858 verified
Update outputs/webarena/README.md c89a626 verified
Create README.md cfa8976 verified
Create README.md c323f7b verified
remove extra merged file 29a3904
Xingyao Wang commited on
add Mixtral 4731bca
Xingyao Wang commited on
support visualization of new swebench-eval 414a759
Xingyao Wang commited on
update results for CodeActSWEAgent 81fb631
Xingyao Wang commited on
remove output merged for a new format 77b13b9
Xingyao Wang commited on
Delete outputs/webarena/BrowsingAgent/gpt-4o-2024-05-13_maxiter_15_N_v1.0/output.jsonl 7168c1c verified
Delete outputs/webarena/BrowsingAgent/gpt-3.5-turbo-0125_maxiter_15_N_v1.0/output.jsonl fe88798 verified
add webarena and miniwob results (#5) aa9fe42 verified
Add MINT results (#6) 764b1c5 verified
agentbench (#3) e7273a2 verified
humanevalfix (#4) 9535215 verified
Create visualization for MINT benchmark & upload results (#2) 054cb87 verified
update results fe6c7e5
Xingyao Wang commited on
plot success rate with cost when available 743d952
Xingyao Wang commited on
add results for deepseek chat v2 126490f
Xingyao Wang commited on
add codeact swe agent 9b33edf
Xingyao Wang commited on
update gitignore 1c3a57d
Xingyao Wang commited on
add gpt4o result for 1.5 5dbfa12
Xingyao Wang commited on
move data to swe_bench_lite 23df10d
Xingyao Wang commited on
Merge commit 'f6d9f43457bdadd36685181efda2fd45e813a02c' d61638c
Xingyao Wang commited on
visualize swe-bench-lite & fix stuck in look 4deac19
Xingyao Wang commited on
add cost info when exists f6d9f43
Xingyao Wang commited on
show errrors 565afe1
Xingyao Wang commited on
rename dir 0d2d477
Xingyao Wang commited on
add result for deepseek f07fb3e
Xingyao Wang commited on
fix visualizer for json 260700f
Xingyao Wang commited on
fix glob 3c245bf
Xingyao Wang commited on
update visualizer on multi-page 1412295
Xingyao Wang commited on
add results for gpt-4o 72c2e93
Xingyao Wang commited on
change to only load merged 3bf3aaa
Xingyao Wang commited on
updare resykts cd893a5
Xingyao Wang commited on
Update README.md f995976 verified
add absolute number of solved 886e465
Xingyao Wang commited on
update float c6f2aaa
Xingyao Wang commited on
change to pct 5864960
Xingyao Wang commited on