Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions 18339c0 MedGRPO Team commited on Jan 13
Fix DVC evaluation to compute temporal F1 when --skip-llm-judge is set dd1b9c6 MedGRPO Team Claude Sonnet 4.5 commited on Jan 13
Add semantic similarity matching for Next Action evaluation a66b9a4 MedGRPO Team Claude Sonnet 4.5 commited on Jan 13
Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy 6d8dbb2 MedGRPO Team commited on Jan 13
Add placeholder for STG metrics when evaluation returns empty 6d00cf8 MedGRPO Team commited on Jan 13
Show evaluation metrics even in silent mode (--grouping overall) 2323031 MedGRPO Team commited on Jan 13
Keep --skip-llm-judge flag, set caption metrics to 0 when skipped 5310708 MedGRPO Team commited on Jan 13