Commit History

Fix eval_dvc.py main() to support --skip-llm-judge flag
5f41159

MedGRPO Team commited on

Fix DVC evaluation to compute temporal F1 when --skip-llm-judge is set
dd1b9c6

MedGRPO Team Claude Sonnet 4.5 commited on

update
a605ebb

MedGRPO Team commited on

Update evaluation metrics and leaderboard display
a36b7fe

MedGRPO Team Claude Sonnet 4.5 commited on

Complete evaluator fixes for all 8 tasks
331979f

MedGRPO Team commited on

Remove all captioning_metrics dependencies
5a5d9ce

MedGRPO Team commited on

Make evaluation scripts fully self-contained
82f81ab

MedGRPO Team commited on

Copy evaluation scripts to leaderboard and clean up template code
ba8d0d4

MedGRPO Team Claude Sonnet 4.5 commited on