Add semantic similarity matching for Next Action evaluation a66b9a4 MedGRPO Team Claude Sonnet 4.5 commited on 4 days ago
Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy 6d8dbb2 MedGRPO Team commited on 4 days ago
Keep --skip-llm-judge flag, set caption metrics to 0 when skipped 5310708 MedGRPO Team commited on 4 days ago
Implement secure ground truth with prediction-only submission format fe743f5 MedGRPO Team commited on 10 days ago
Add support for pre-computed LLM judge scores 31817d3 MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago
Make validation flexible: accept both 'answer'/'response' and 'gnd'/'ground_truth' field names 83aad2b MedGRPO Team commited on 10 days ago
Fix leaderboard display: remove average column, show 10 metrics, fix Tasks tab 45e64f5 MedGRPO Team commited on 10 days ago
Update evaluation metrics and leaderboard display a36b7fe MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago
Copy evaluation scripts to leaderboard and clean up template code ba8d0d4 MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago
Integrate MedGRPO evaluation pipeline with leaderboard 15c8be4 MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago
Fix Gradio schema error and deployment configuration 73ea6a1 MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago
Duplicate from gradio-templates/leaderboard b0e8748 gaozhongpai abidlabs HF Staff commited on 10 days ago