Spaces:

RomeroLab-Duke
/

BioDesignBench-Leaderboard

Running

Jasonkim8652 commited on Mar 3

Commit

eecaec9

verified ·

1 Parent(s): 372e094

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,12 +1,33 @@
 ---
 title: BioDesignBench Leaderboard
-emoji: 🐠
-colorFrom: purple
 colorTo: purple
 sdk: gradio
-sdk_version: 6.8.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: BioDesignBench Leaderboard
+emoji: "\U0001F9EC"
+colorFrom: blue
 colorTo: purple
 sdk: gradio
+sdk_version: "4.44.0"
 app_file: app.py
 pinned: false
+license: mit
 ---
+# BioDesignBench Leaderboard
+Evaluating LLM Agents on Protein Design via MCP Tools.
+**Romero Lab, Duke University**
+## Overview
+BioDesignBench is the first comprehensive benchmark for evaluating LLM agents on
+protein design tasks via MCP (Model Context Protocol) tool use. This leaderboard
+tracks agent performance across 76 design tasks spanning 17 taxonomy cells
+(5 DesignTaskTypes x 6 BiologicalContexts), scored on a 100-point rubric with
+6 components: Approach, Orchestration, Quality, Feasibility, Novelty, Diversity.
+## Features
+- **Overall Leaderboard** — Mixed-ranking table with baselines and LLM agents
+- **Taxonomy Breakdown** — Heatmap of per-cell scores across 17 taxonomy cells
+- **Component Analysis** — Radar and bar charts comparing 6 scoring components
+- **Benchmark vs User Mode** — Paired comparison of the same LLM in two modes
+- **About** — Methodology, submission guide, and citation info