Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Spaces:
UIIAmerica
/
MedVidBench-Leaderboard
Running

App Files Files Community
Fetching metadata from the HF Docker repository...
MedVidBench-Leaderboard / evaluation
231 kB
  • 4 contributors
History: 39 commits
MedGRPO Team
update name
9aa418f about 3 hours ago
  • llm_judge
    Add server-side LLM judge for caption evaluation 6 days ago
  • README.md
    11.5 kB
    update about 14 hours ago
  • dataset_utils.py
    3.09 kB
    Copy evaluation scripts to leaderboard and clean up template code 6 days ago
  • eval_caption_llm_judge.py
    21.6 kB
    update name about 3 hours ago
  • eval_cvs_assessment.py
    13.7 kB
    Fix syntax errors and add TAL wrapper functions 6 days ago
  • eval_dvc.py
    9.98 kB
    Fix eval_dvc.py main() to support --skip-llm-judge flag about 8 hours ago
  • eval_next_action.py
    23.4 kB
    Add semantic similarity matching for Next Action evaluation about 9 hours ago
  • eval_skill_assessment.py
    15.3 kB
    Fix syntax errors and add TAL wrapper functions 6 days ago
  • eval_stg.py
    12.3 kB
    Fix STG evaluation to extract bbox_dict from struc_info about 9 hours ago
  • eval_tal.py
    11.2 kB
    Fix TAL overall metrics computation and extraction about 9 hours ago
  • evaluate_all_pai.py
    42.5 kB
    Fix evaluate_all_pai to pass --skip-llm-judge to task main() functions about 8 hours ago
  • evaluate_predictions.py
    10.9 kB
    upload prediction only about 4 hours ago
  • extract_predictions.py
    2.74 kB
    upload prediction only about 4 hours ago
  • merge_predictions_with_gt.py
    5.9 kB
    upload prediction only about 4 hours ago
  • test_evaluation.sh
    5.34 kB
    update about 14 hours ago