Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
UIIAmerica
/
MedVidBench-Leaderboard
like
3
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
MedVidBench-Leaderboard
/
evaluation
231 kB
4 contributors
History:
39 commits
MedGRPO Team
update name
9aa418f
about 2 months ago
llm_judge
Add server-side LLM judge for caption evaluation
about 2 months ago
README.md
11.5 kB
update
about 2 months ago
dataset_utils.py
3.09 kB
Copy evaluation scripts to leaderboard and clean up template code
about 2 months ago
eval_caption_llm_judge.py
21.6 kB
update name
about 2 months ago
eval_cvs_assessment.py
13.7 kB
Fix syntax errors and add TAL wrapper functions
about 2 months ago
eval_dvc.py
9.98 kB
Fix eval_dvc.py main() to support --skip-llm-judge flag
about 2 months ago
eval_next_action.py
23.4 kB
Add semantic similarity matching for Next Action evaluation
about 2 months ago
eval_skill_assessment.py
15.3 kB
Fix syntax errors and add TAL wrapper functions
about 2 months ago
eval_stg.py
12.3 kB
Fix STG evaluation to extract bbox_dict from struc_info
about 2 months ago
eval_tal.py
11.2 kB
Fix TAL overall metrics computation and extraction
about 2 months ago
evaluate_all_pai.py
42.5 kB
Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions
about 2 months ago
evaluate_predictions.py
10.9 kB
upload prediction only
about 2 months ago
extract_predictions.py
2.74 kB
upload prediction only
about 2 months ago
merge_predictions_with_gt.py
5.9 kB
upload prediction only
about 2 months ago
test_evaluation.sh
5.34 kB
update
about 2 months ago