Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
UIIAmerica
/
MedVidBench-Leaderboard
like
1
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
MedVidBench-Leaderboard
/
evaluation
231 kB
4 contributors
History:
39 commits
MedGRPO Team
update name
9aa418f
about 3 hours ago
llm_judge
Add server-side LLM judge for caption evaluation
6 days ago
README.md
Safe
11.5 kB
update
about 14 hours ago
dataset_utils.py
Safe
3.09 kB
Copy evaluation scripts to leaderboard and clean up template code
6 days ago
eval_caption_llm_judge.py
Safe
21.6 kB
update name
about 3 hours ago
eval_cvs_assessment.py
Safe
13.7 kB
Fix syntax errors and add TAL wrapper functions
6 days ago
eval_dvc.py
Safe
9.98 kB
Fix eval_dvc.py main() to support --skip-llm-judge flag
about 8 hours ago
eval_next_action.py
Safe
23.4 kB
Add semantic similarity matching for Next Action evaluation
about 9 hours ago
eval_skill_assessment.py
Safe
15.3 kB
Fix syntax errors and add TAL wrapper functions
6 days ago
eval_stg.py
Safe
12.3 kB
Fix STG evaluation to extract bbox_dict from struc_info
about 9 hours ago
eval_tal.py
Safe
11.2 kB
Fix TAL overall metrics computation and extraction
about 9 hours ago
evaluate_all_pai.py
Safe
42.5 kB
Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions
about 8 hours ago
evaluate_predictions.py
Safe
10.9 kB
upload prediction only
about 4 hours ago
extract_predictions.py
Safe
2.74 kB
upload prediction only
about 4 hours ago
merge_predictions_with_gt.py
Safe
5.9 kB
upload prediction only
about 4 hours ago
test_evaluation.sh
Safe
5.34 kB
update
about 14 hours ago