Spaces:

UIIAmerica
/

MedVidBench-Leaderboard

Running

App Files Files Community

MedVidBench-Leaderboard / evaluation

231 kB

4 contributors

History: 39 commits

MedGRPO Team

update name

9aa418f about 2 months ago

llm_judge
Add server-side LLM judge for caption evaluation about 2 months ago
README.md
11.5 kB

update about 2 months ago
dataset_utils.py
3.09 kB

Copy evaluation scripts to leaderboard and clean up template code about 2 months ago
eval_caption_llm_judge.py
21.6 kB

update name about 2 months ago
eval_cvs_assessment.py
13.7 kB

Fix syntax errors and add TAL wrapper functions about 2 months ago
eval_dvc.py
9.98 kB

Fix eval_dvc.py main() to support --skip-llm-judge flag about 2 months ago
eval_next_action.py
23.4 kB

Add semantic similarity matching for Next Action evaluation about 2 months ago
eval_skill_assessment.py
15.3 kB

Fix syntax errors and add TAL wrapper functions about 2 months ago
eval_stg.py
12.3 kB

Fix STG evaluation to extract bbox_dict from struc_info about 2 months ago
eval_tal.py
11.2 kB

Fix TAL overall metrics computation and extraction about 2 months ago
evaluate_all_pai.py
42.5 kB

Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions about 2 months ago
evaluate_predictions.py
10.9 kB

upload prediction only about 2 months ago
extract_predictions.py
2.74 kB

upload prediction only about 2 months ago
merge_predictions_with_gt.py
5.9 kB

upload prediction only about 2 months ago
test_evaluation.sh
5.34 kB

update about 2 months ago