Jasonkim8652 commited on
Commit
eecaec9
·
verified ·
1 Parent(s): 372e094

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +25 -4
README.md CHANGED
@@ -1,12 +1,33 @@
1
  ---
2
  title: BioDesignBench Leaderboard
3
- emoji: 🐠
4
- colorFrom: purple
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: 6.8.0
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: BioDesignBench Leaderboard
3
+ emoji: "\U0001F9EC"
4
+ colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: "4.44.0"
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # BioDesignBench Leaderboard
14
+
15
+ Evaluating LLM Agents on Protein Design via MCP Tools.
16
+
17
+ **Romero Lab, Duke University**
18
+
19
+ ## Overview
20
+
21
+ BioDesignBench is the first comprehensive benchmark for evaluating LLM agents on
22
+ protein design tasks via MCP (Model Context Protocol) tool use. This leaderboard
23
+ tracks agent performance across 76 design tasks spanning 17 taxonomy cells
24
+ (5 DesignTaskTypes x 6 BiologicalContexts), scored on a 100-point rubric with
25
+ 6 components: Approach, Orchestration, Quality, Feasibility, Novelty, Diversity.
26
+
27
+ ## Features
28
+
29
+ - **Overall Leaderboard** — Mixed-ranking table with baselines and LLM agents
30
+ - **Taxonomy Breakdown** — Heatmap of per-cell scores across 17 taxonomy cells
31
+ - **Component Analysis** — Radar and bar charts comparing 6 scoring components
32
+ - **Benchmark vs User Mode** — Paired comparison of the same LLM in two modes
33
+ - **About** — Methodology, submission guide, and citation info