Spaces:

UIIAmerica
/

MedVidBench-Leaderboard

Running

App Files Files Community

MedVidBench-Leaderboard

Commit History

update

9aa148b

Running

MedGRPO Team commited on Jan 14

relax

0098a7d

MedGRPO Team commited on Jan 14

update name

9aa418f

MedGRPO Team commited on Jan 14

update

0562da7

MedGRPO Team commited on Jan 14

improve the prompt

758710c

MedGRPO Team commited on Jan 14

update path

67393fd

MedGRPO Team commited on Jan 14

update

176a6d5

MedGRPO Team commited on Jan 14

update

4752404

MedGRPO Team commited on Jan 14

clean the code

6edbd17

MedGRPO Team commited on Jan 14

update

8ef4c38

MedGRPO Team commited on Jan 14

update GT path

9bb9edd

MedGRPO Team commited on Jan 14

update

5a09659

MedGRPO Team commited on Jan 14

upload prediction only

b28cd8f

MedGRPO Team commited on Jan 14

update

2362e57

MedGRPO Team commited on Jan 14

update

f0e43d6

MedGRPO Team commited on Jan 14

update

4a199ff

MedGRPO Team commited on Jan 14

Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions

18339c0

MedGRPO Team commited on Jan 13

Fix eval_dvc.py main() to support --skip-llm-judge flag

5f41159

MedGRPO Team commited on Jan 13

Fix DVC evaluation to compute temporal F1 when --skip-llm-judge is set

dd1b9c6

MedGRPO Team Claude Sonnet 4.5 commited on Jan 13

Add semantic similarity matching for Next Action evaluation

a66b9a4

MedGRPO Team Claude Sonnet 4.5 commited on Jan 13

Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy

6d8dbb2

MedGRPO Team commited on Jan 13

Fix TAL overall metrics computation and extraction

c8f4cad

MedGRPO Team commited on Jan 13

Use STG overall metrics instead of placeholder

d1337a3

MedGRPO Team commited on Jan 13

Fix STG evaluation to extract bbox_dict from struc_info

0b29eca

MedGRPO Team commited on Jan 13

Add placeholder for STG metrics when evaluation returns empty

6d00cf8

MedGRPO Team commited on Jan 13

Fix STG and next_action metric printing in overall mode

049c07c

MedGRPO Team commited on Jan 13

Show evaluation metrics even in silent mode (--grouping overall)

2323031

MedGRPO Team commited on Jan 13

Keep --skip-llm-judge flag, set caption metrics to 0 when skipped

5310708

MedGRPO Team commited on Jan 13

Remove --skip-llm-judge flag for production

3a69282

MedGRPO Team commited on Jan 13

Add process.wait() to ensure returncode is set

77d73db

MedGRPO Team commited on Jan 13

Add matplotlib to requirements

da11adf

MedGRPO Team commited on Jan 13

Filter DVC/VS/RC from tasks list when skip-llm-judge is set

2bd924c

MedGRPO Team commited on Jan 13

Add back --skip-llm-judge for faster testing

ebf8102

MedGRPO Team commited on Jan 13

Remove skip-llm-judge - use semantic similarity fallback

a3fa530

MedGRPO Team commited on Jan 13

Add --skip-llm-judge flag for faster evaluation

3487a07

MedGRPO Team commited on Jan 13

Remove debug messages - system working correctly

80e3e7d

MedGRPO Team commited on Jan 13

Add debug inside loop to verify entry

d53d0f7

MedGRPO Team commited on Jan 13

Add debug at line 741

d878ea1

MedGRPO Team commited on Jan 13

Add debug before task list print

c886e23

MedGRPO Team commited on Jan 13

Remove DEBUG filter - allow debug messages through

2497030

MedGRPO Team commited on Jan 13

Debug before/after imports

4e448a4

MedGRPO Team commited on Jan 13

Add debug at function entry and after analyze

c736ac8

MedGRPO Team commited on Jan 13

Add explicit flush debug

1c45d6c

MedGRPO Team commited on Jan 13

update

f0846a5

MedGRPO Team commited on Jan 13

update

1af117c

MedGRPO Team commited on Jan 13

update

af13c42

MedGRPO Team commited on Jan 13

update

ebac2de

MedGRPO Team commited on Jan 13

update

807bf44

MedGRPO Team commited on Jan 13

add merged format

e1e2d25

MedGRPO Team commited on Jan 13

update

a605ebb

MedGRPO Team commited on Jan 13