Commits · UIIAmerica/MedVidBench-Leaderboard

update

9aa148b

MedGRPO Team commited on 3 days ago

relax

0098a7d

MedGRPO Team commited on 3 days ago

update path

67393fd

MedGRPO Team commited on 4 days ago

update

176a6d5

MedGRPO Team commited on 4 days ago

update

4752404

MedGRPO Team commited on 4 days ago

update

8ef4c38

MedGRPO Team commited on 4 days ago

update GT path

9bb9edd

MedGRPO Team commited on 4 days ago

update

5a09659

MedGRPO Team commited on 4 days ago

update

2362e57

MedGRPO Team commited on 4 days ago

update

f0e43d6

MedGRPO Team commited on 4 days ago

update

4a199ff

MedGRPO Team commited on 4 days ago

Add semantic similarity matching for Next Action evaluation

a66b9a4

MedGRPO Team Claude Sonnet 4.5 commited on 4 days ago

Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy

6d8dbb2

MedGRPO Team commited on 4 days ago

Fix TAL overall metrics computation and extraction

c8f4cad

MedGRPO Team commited on 4 days ago

Keep --skip-llm-judge flag, set caption metrics to 0 when skipped

5310708

MedGRPO Team commited on 4 days ago

Remove --skip-llm-judge flag for production

3a69282

MedGRPO Team commited on 4 days ago

Add process.wait() to ensure returncode is set

77d73db

MedGRPO Team commited on 4 days ago

Add back --skip-llm-judge for faster testing

ebf8102

MedGRPO Team commited on 4 days ago

Remove skip-llm-judge - use semantic similarity fallback

a3fa530

MedGRPO Team commited on 4 days ago

Add --skip-llm-judge flag for faster evaluation

3487a07

MedGRPO Team commited on 4 days ago

Remove DEBUG filter - allow debug messages through

2497030

MedGRPO Team commited on 4 days ago

update

1af117c

MedGRPO Team commited on 4 days ago

update

af13c42

MedGRPO Team commited on 4 days ago

update

ebac2de

MedGRPO Team commited on 4 days ago

update

807bf44

MedGRPO Team commited on 4 days ago

add merged format

e1e2d25

MedGRPO Team commited on 4 days ago

update name

04f5f37

MedGRPO Team commited on 8 days ago

Implement secure ground truth with prediction-only submission format

fe743f5

MedGRPO Team commited on 10 days ago

Add support for pre-computed LLM judge scores

31817d3

MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago

Make validation flexible: accept both 'answer'/'response' and 'gnd'/'ground_truth' field names

83aad2b

MedGRPO Team commited on 10 days ago

Add contact column back to leaderboard display

4510cf8

MedGRPO Team commited on 10 days ago

Fix leaderboard display: remove average column, show 10 metrics, fix Tasks tab

45e64f5

MedGRPO Team commited on 10 days ago

Update evaluation metrics and leaderboard display

a36b7fe

MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago

Copy evaluation scripts to leaderboard and clean up template code

ba8d0d4

MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago

Integrate MedGRPO evaluation pipeline with leaderboard

15c8be4

MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago

Fix Gradio schema error and deployment configuration

73ea6a1

MedGRPO Team Claude Sonnet 4.5 commited on 10 days ago

Duplicate from gradio-templates/leaderboard

b0e8748

gaozhongpai

abidlabs HF Staff commited on 10 days ago

Spaces:

UIIAmerica
/

MedVidBench-Leaderboard

Sleeping

Commit History

update

9aa148b

relax

0098a7d

update path

67393fd

update

176a6d5

update

4752404

update

8ef4c38

update GT path

9bb9edd

update

5a09659

update

2362e57

update

f0e43d6

update

4a199ff

Add semantic similarity matching for Next Action evaluation

a66b9a4

Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy

6d8dbb2

Fix TAL overall metrics computation and extraction

c8f4cad

Keep --skip-llm-judge flag, set caption metrics to 0 when skipped

5310708

Remove --skip-llm-judge flag for production

3a69282

Add process.wait() to ensure returncode is set

77d73db

Add back --skip-llm-judge for faster testing

ebf8102

Remove skip-llm-judge - use semantic similarity fallback

a3fa530

Add --skip-llm-judge flag for faster evaluation

3487a07

Remove DEBUG filter - allow debug messages through

2497030

update

1af117c

update

af13c42

update

ebac2de

update

807bf44

add merged format

e1e2d25

update name

04f5f37

Implement secure ground truth with prediction-only submission format

fe743f5

Add support for pre-computed LLM judge scores

31817d3

Make validation flexible: accept both 'answer'/'response' and 'gnd'/'ground_truth' field names

83aad2b

Add contact column back to leaderboard display

4510cf8

Fix leaderboard display: remove average column, show 10 metrics, fix Tasks tab

45e64f5

Update evaluation metrics and leaderboard display

a36b7fe

Copy evaluation scripts to leaderboard and clean up template code

ba8d0d4

Integrate MedGRPO evaluation pipeline with leaderboard

15c8be4

Fix Gradio schema error and deployment configuration

73ea6a1

Duplicate from gradio-templates/leaderboard

b0e8748

Commit History

update 9aa148b

relax 0098a7d

update path 67393fd

update 176a6d5

update 4752404

update 8ef4c38

update GT path 9bb9edd

update 5a09659

update 2362e57

update f0e43d6

update 4a199ff

Add semantic similarity matching for Next Action evaluation a66b9a4

Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy 6d8dbb2

Fix TAL overall metrics computation and extraction c8f4cad

Keep --skip-llm-judge flag, set caption metrics to 0 when skipped 5310708

Remove --skip-llm-judge flag for production 3a69282

Add process.wait() to ensure returncode is set 77d73db

Add back --skip-llm-judge for faster testing ebf8102

Remove skip-llm-judge - use semantic similarity fallback a3fa530

Add --skip-llm-judge flag for faster evaluation 3487a07

Remove DEBUG filter - allow debug messages through 2497030

update 1af117c

update af13c42

update ebac2de

update 807bf44

add merged format e1e2d25

update name 04f5f37

Implement secure ground truth with prediction-only submission format fe743f5

Add support for pre-computed LLM judge scores 31817d3

Make validation flexible: accept both 'answer'/'response' and 'gnd'/'ground_truth' field names 83aad2b

Add contact column back to leaderboard display 4510cf8

Fix leaderboard display: remove average column, show 10 metrics, fix Tasks tab 45e64f5

Update evaluation metrics and leaderboard display a36b7fe

Copy evaluation scripts to leaderboard and clean up template code ba8d0d4

Integrate MedGRPO evaluation pipeline with leaderboard 15c8be4

Fix Gradio schema error and deployment configuration 73ea6a1

Duplicate from gradio-templates/leaderboard b0e8748

update

9aa148b

relax

0098a7d

update path

67393fd

update

176a6d5

update

4752404

update

8ef4c38

update GT path

9bb9edd

update

5a09659

update

2362e57

update

f0e43d6

update

4a199ff

Add semantic similarity matching for Next Action evaluation

a66b9a4

Fix CVS_acc to use raw accuracy instead of component_balanced_accuracy

6d8dbb2

Fix TAL overall metrics computation and extraction

c8f4cad

Keep --skip-llm-judge flag, set caption metrics to 0 when skipped

5310708

Remove --skip-llm-judge flag for production

3a69282

Add process.wait() to ensure returncode is set

77d73db

Add back --skip-llm-judge for faster testing

ebf8102

Remove skip-llm-judge - use semantic similarity fallback

a3fa530

Add --skip-llm-judge flag for faster evaluation

3487a07

Remove DEBUG filter - allow debug messages through

2497030

update

1af117c

update

af13c42

update

ebac2de

update

807bf44

add merged format

e1e2d25

update name

04f5f37

Implement secure ground truth with prediction-only submission format

fe743f5

Add support for pre-computed LLM judge scores

31817d3

Make validation flexible: accept both 'answer'/'response' and 'gnd'/'ground_truth' field names

83aad2b

Add contact column back to leaderboard display

4510cf8

Fix leaderboard display: remove average column, show 10 metrics, fix Tasks tab

45e64f5

Update evaluation metrics and leaderboard display

a36b7fe

Copy evaluation scripts to leaderboard and clean up template code

ba8d0d4

Integrate MedGRPO evaluation pipeline with leaderboard

15c8be4

Fix Gradio schema error and deployment configuration

73ea6a1

Duplicate from gradio-templates/leaderboard

b0e8748