Commit History

update name
9aa418f
Running

MedGRPO Team commited on

update
0562da7

MedGRPO Team commited on

improve the prompt
758710c

MedGRPO Team commited on

update
4752404

MedGRPO Team commited on

upload prediction only
b28cd8f

MedGRPO Team commited on

Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions
18339c0

MedGRPO Team commited on

Fix eval_dvc.py main() to support --skip-llm-judge flag
5f41159

MedGRPO Team commited on

Fix DVC evaluation to compute temporal F1 when --skip-llm-judge is set
dd1b9c6

MedGRPO Team Claude Sonnet 4.5 commited on

Add semantic similarity matching for Next Action evaluation
a66b9a4

MedGRPO Team Claude Sonnet 4.5 commited on

Fix TAL overall metrics computation and extraction
c8f4cad

MedGRPO Team commited on

Use STG overall metrics instead of placeholder
d1337a3

MedGRPO Team commited on

Fix STG evaluation to extract bbox_dict from struc_info
0b29eca

MedGRPO Team commited on

Add placeholder for STG metrics when evaluation returns empty
6d00cf8

MedGRPO Team commited on

Fix STG and next_action metric printing in overall mode
049c07c

MedGRPO Team commited on

Show evaluation metrics even in silent mode (--grouping overall)
2323031

MedGRPO Team commited on

Filter DVC/VS/RC from tasks list when skip-llm-judge is set
2bd924c

MedGRPO Team commited on

Remove debug messages - system working correctly
80e3e7d

MedGRPO Team commited on

Add debug inside loop to verify entry
d53d0f7

MedGRPO Team commited on

Add debug at line 741
d878ea1

MedGRPO Team commited on

Add debug before task list print
c886e23

MedGRPO Team commited on

Debug before/after imports
4e448a4

MedGRPO Team commited on

Add debug at function entry and after analyze
c736ac8

MedGRPO Team commited on

Add explicit flush debug
1c45d6c

MedGRPO Team commited on

update
f0846a5

MedGRPO Team commited on

update
1af117c

MedGRPO Team commited on

update
a605ebb

MedGRPO Team commited on

Implement secure ground truth with prediction-only submission format
fe743f5

MedGRPO Team commited on

Add support for pre-computed LLM judge scores
31817d3

MedGRPO Team Claude Sonnet 4.5 commited on

Update evaluation metrics and leaderboard display
a36b7fe

MedGRPO Team Claude Sonnet 4.5 commited on

Complete evaluator fixes for all 8 tasks
331979f

MedGRPO Team commited on

Fix syntax errors and add TAL wrapper functions
8c805bc

MedGRPO Team commited on

Fix evaluate_all_pai.py to use eval_caption_llm_judge
58dd6d7

MedGRPO Team commited on

Consolidate evaluation scripts and remove hardcoded paths
3ea8a3a

MedGRPO Team commited on

Remove all captioning_metrics dependencies
5a5d9ce

MedGRPO Team commited on

Remove unused captioning_metrics folder
a09374a

MedGRPO Team commited on

Add server-side LLM judge for caption evaluation
da2e674

MedGRPO Team commited on

Make evaluation scripts fully self-contained
82f81ab

MedGRPO Team commited on

Optimize evaluation scripts - remove optional files
f986294

MedGRPO Team Claude Sonnet 4.5 commited on

Copy evaluation scripts to leaderboard and clean up template code
ba8d0d4

MedGRPO Team Claude Sonnet 4.5 commited on