medrax2 / benchmarking

Commit History

yes
dba3d2e

Junzhe Li commited on

Merge branch 'main' into tool-changes
e4e9fae

VictorLJZ commited on

fixing agent
fd330d9

Junzhe Li commited on

Set cuda:0 for medrax provider
35e8f3b

Adibvafa commited on

Merge branch 'main' into feature/benchmark
e6e3048

VictorLJZ commited on

fixes
d38594f

Junzhe Li commited on

revamped benchmarking suite
89321e2

Junzhe Li commited on

updated benchmarks
b93ad3f

Junzhe Li commited on

Fix merge conflicts
128b355

Adibvafa commited on

fixed rexvqa benchmark and added handling for image norm for tools
16278b5

victorli commited on

partially fixed rexvqa benchmark
c963ad3

victorli commited on

added concurrency to benchmarking
aa37a55

victorli commited on

Fix merge conflicts
bc86327

Adibvafa commited on

Setup MedRAX2 for test
e116685

Adibvafa commited on

fixed medgemma
7b3e756

victorli commited on

cleared merge issues
e97f266

victorli commited on

adding medgemma
1516987

victorli commited on

final updates
044eaf7

VictorLJZ commited on

refactored tools
25fe9b8

VictorLJZ commited on

added duckduckgo tool
7608486

VictorLJZ commited on

add cuda
6ee3108

Adibvafa commited on

updates
358df7a

VictorLJZ commited on

updates
148dc3c

VictorLJZ commited on

updates
d6cb1b4

VictorLJZ commited on

final fixed version
c90e4b6

VictorLJZ commited on

changes
6db0a72

VictorLJZ commited on

modified agent setup
05f20bf

VictorLJZ commited on

Chagne formatting to use boxed
36f77f2

Adibvafa commited on

Prep for actual benchmarking
7f4d4c2

Adibvafa commited on

fixed medrax provider
e25f680

VictorLJZ commited on

updates
587ddab

VictorLJZ commited on

Merge branch 'feature/benchmark' into victor
4216843

VictorLJZ commited on

Seed random shuffle
6bbb74f

Adibvafa commited on

Enable shuffling for benchmark
ddb8bad

Adibvafa commited on

Merge branch 'feature/benchmark' into victor
35025a2

VictorLJZ commited on

updates
d01acb6

VictorLJZ commited on

refactored rexvqa benchmark
eaff77c

VictorLJZ commited on

Fix prompt load
a7d0aad

Adibvafa commited on

Supoort individual logging
5084d75

Adibvafa commited on

removed anthropic provider
f350e79

VictorLJZ commited on

added anthropic provider
b0f54ed

VictorLJZ commited on

addressed PR comments
9c54b1c

VictorLJZ commited on

added openrouter provider
f60c51c

VictorLJZ commited on

adding this just in case
e040fb2

VictorLJZ commited on

rexvqa now works
c7b65ec

VictorLJZ commited on

updates
d07a267

VictorLJZ commited on

added chestagentbench
2b26ed4

VictorLJZ commited on

updates
e08f161

VictorLJZ commited on

simplifying evaluations
a8f2960

VictorLJZ commited on