Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions 18339c0 MedGRPO Team commited on about 20 hours ago
Fix eval_dvc.py main() to support --skip-llm-judge flag 5f41159 MedGRPO Team commited on about 20 hours ago
Fix DVC evaluation to compute temporal F1 when --skip-llm-judge is set dd1b9c6 MedGRPO Team Claude Sonnet 4.5 commited on about 20 hours ago
Add semantic similarity matching for Next Action evaluation a66b9a4 MedGRPO Team Claude Sonnet 4.5 commited on about 21 hours ago
Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy 6d8dbb2 MedGRPO Team commited on about 21 hours ago
Fix TAL overall metrics computation and extraction c8f4cad MedGRPO Team commited on about 21 hours ago
Fix STG evaluation to extract bbox_dict from struc_info 0b29eca MedGRPO Team commited on about 21 hours ago
Add placeholder for STG metrics when evaluation returns empty 6d00cf8 MedGRPO Team commited on about 21 hours ago
Fix STG and next_action metric printing in overall mode 049c07c MedGRPO Team commited on about 22 hours ago
Show evaluation metrics even in silent mode (--grouping overall) 2323031 MedGRPO Team commited on about 22 hours ago
Keep --skip-llm-judge flag, set caption metrics to 0 when skipped 5310708 MedGRPO Team commited on about 22 hours ago
Filter DVC/VS/RC from tasks list when skip-llm-judge is set 2bd924c MedGRPO Team commited on about 22 hours ago
Remove skip-llm-judge - use semantic similarity fallback a3fa530 MedGRPO Team commited on about 22 hours ago
Remove DEBUG filter - allow debug messages through 2497030 MedGRPO Team commited on about 23 hours ago