Commit History

update
9aa148b

MedGRPO Team commited on

relax
0098a7d

MedGRPO Team commited on

update path
67393fd

MedGRPO Team commited on

update
176a6d5

MedGRPO Team commited on

update
4752404

MedGRPO Team commited on

update
8ef4c38

MedGRPO Team commited on

update GT path
9bb9edd

MedGRPO Team commited on

update
5a09659

MedGRPO Team commited on

update
2362e57

MedGRPO Team commited on

update
f0e43d6

MedGRPO Team commited on

update
4a199ff

MedGRPO Team commited on

Add semantic similarity matching for Next Action evaluation
a66b9a4

MedGRPO Team Claude Sonnet 4.5 commited on

Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy
6d8dbb2

MedGRPO Team commited on

Fix TAL overall metrics computation and extraction
c8f4cad

MedGRPO Team commited on

Keep --skip-llm-judge flag, set caption metrics to 0 when skipped
5310708

MedGRPO Team commited on

Remove --skip-llm-judge flag for production
3a69282

MedGRPO Team commited on

Add process.wait() to ensure returncode is set
77d73db

MedGRPO Team commited on

Add back --skip-llm-judge for faster testing
ebf8102

MedGRPO Team commited on

Remove skip-llm-judge - use semantic similarity fallback
a3fa530

MedGRPO Team commited on

Add --skip-llm-judge flag for faster evaluation
3487a07

MedGRPO Team commited on

Remove DEBUG filter - allow debug messages through
2497030

MedGRPO Team commited on

update
1af117c

MedGRPO Team commited on

update
af13c42

MedGRPO Team commited on

update
ebac2de

MedGRPO Team commited on

update
807bf44

MedGRPO Team commited on

add merged format
e1e2d25

MedGRPO Team commited on

update name
04f5f37

MedGRPO Team commited on

Implement secure ground truth with prediction-only submission format
fe743f5

MedGRPO Team commited on

Add support for pre-computed LLM judge scores
31817d3

MedGRPO Team Claude Sonnet 4.5 commited on

Make validation flexible: accept both 'answer'/'response' and 'gnd'/'ground_truth' field names
83aad2b

MedGRPO Team commited on

Add contact column back to leaderboard display
4510cf8

MedGRPO Team commited on

Fix leaderboard display: remove average column, show 10 metrics, fix Tasks tab
45e64f5

MedGRPO Team commited on

Update evaluation metrics and leaderboard display
a36b7fe

MedGRPO Team Claude Sonnet 4.5 commited on

Copy evaluation scripts to leaderboard and clean up template code
ba8d0d4

MedGRPO Team Claude Sonnet 4.5 commited on

Integrate MedGRPO evaluation pipeline with leaderboard
15c8be4

MedGRPO Team Claude Sonnet 4.5 commited on

Fix Gradio schema error and deployment configuration
73ea6a1

MedGRPO Team Claude Sonnet 4.5 commited on

Duplicate from gradio-templates/leaderboard
b0e8748

gaozhongpai abidlabs HF Staff commited on