Commit History

update name
9aa418f
Running

MedGRPO Team commited on

update
0562da7

MedGRPO Team commited on

improve the prompt
758710c

MedGRPO Team commited on

update path
67393fd

MedGRPO Team commited on

update
176a6d5

MedGRPO Team commited on

update
4752404

MedGRPO Team commited on

clean the code
6edbd17

MedGRPO Team commited on

update
8ef4c38

MedGRPO Team commited on

update GT path
9bb9edd

MedGRPO Team commited on

update
5a09659

MedGRPO Team commited on

upload prediction only
b28cd8f

MedGRPO Team commited on

update
2362e57

MedGRPO Team commited on

update
f0e43d6

MedGRPO Team commited on

update
4a199ff

MedGRPO Team commited on

Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions
18339c0

MedGRPO Team commited on

Fix eval_dvc.py main() to support --skip-llm-judge flag
5f41159

MedGRPO Team commited on

Fix DVC evaluation to compute temporal F1 when --skip-llm-judge is set
dd1b9c6

MedGRPO Team Claude Sonnet 4.5 commited on

Add semantic similarity matching for Next Action evaluation
a66b9a4

MedGRPO Team Claude Sonnet 4.5 commited on

Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy
6d8dbb2

MedGRPO Team commited on

Fix TAL overall metrics computation and extraction
c8f4cad

MedGRPO Team commited on

Use STG overall metrics instead of placeholder
d1337a3

MedGRPO Team commited on

Fix STG evaluation to extract bbox_dict from struc_info
0b29eca

MedGRPO Team commited on

Add placeholder for STG metrics when evaluation returns empty
6d00cf8

MedGRPO Team commited on

Fix STG and next_action metric printing in overall mode
049c07c

MedGRPO Team commited on

Show evaluation metrics even in silent mode (--grouping overall)
2323031

MedGRPO Team commited on

Keep --skip-llm-judge flag, set caption metrics to 0 when skipped
5310708

MedGRPO Team commited on

Remove --skip-llm-judge flag for production
3a69282

MedGRPO Team commited on

Add process.wait() to ensure returncode is set
77d73db

MedGRPO Team commited on

Add matplotlib to requirements
da11adf

MedGRPO Team commited on

Filter DVC/VS/RC from tasks list when skip-llm-judge is set
2bd924c

MedGRPO Team commited on

Add back --skip-llm-judge for faster testing
ebf8102

MedGRPO Team commited on

Remove skip-llm-judge - use semantic similarity fallback
a3fa530

MedGRPO Team commited on

Add --skip-llm-judge flag for faster evaluation
3487a07

MedGRPO Team commited on

Remove debug messages - system working correctly
80e3e7d

MedGRPO Team commited on

Add debug inside loop to verify entry
d53d0f7

MedGRPO Team commited on

Add debug at line 741
d878ea1

MedGRPO Team commited on

Add debug before task list print
c886e23

MedGRPO Team commited on

Remove DEBUG filter - allow debug messages through
2497030

MedGRPO Team commited on

Debug before/after imports
4e448a4

MedGRPO Team commited on

Add debug at function entry and after analyze
c736ac8

MedGRPO Team commited on

Add explicit flush debug
1c45d6c

MedGRPO Team commited on

update
f0846a5

MedGRPO Team commited on

update
1af117c

MedGRPO Team commited on

update
af13c42

MedGRPO Team commited on

update
ebac2de

MedGRPO Team commited on

update
807bf44

MedGRPO Team commited on

add merged format
e1e2d25

MedGRPO Team commited on

update
a605ebb

MedGRPO Team commited on

update name
04f5f37

MedGRPO Team commited on

update
73fd321

MedGRPO Team commited on