update leaderboard with rescored results and fair diversity formula
Browse files- README.md +79 -13
- app.py +3 -3
- eval_scorer.py +6 -5
- leaderboard_data.json +401 -250
README.md
CHANGED
|
@@ -12,22 +12,88 @@ license: mit
|
|
| 12 |
|
| 13 |
# BioDesignBench Leaderboard
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
**Romero Lab, Duke University**
|
| 18 |
|
| 19 |
-
##
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
-
##
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
# BioDesignBench Leaderboard
|
| 14 |
|
| 15 |
+
Interactive leaderboard for **BioDesignBench**, a benchmark evaluating LLM agents on protein design tasks via MCP (Model Context Protocol) tool use.
|
| 16 |
|
| 17 |
**Romero Lab, Duke University**
|
| 18 |
|
| 19 |
+
## What the leaderboard shows
|
| 20 |
|
| 21 |
+
- **Overall Leaderboard** -- Mixed-ranking table with human baselines and LLM agents, filterable by mode (benchmark/user), MCP tool type (reference/custom), and entry type.
|
| 22 |
+
- **Taxonomy Breakdown** -- Heatmap of per-cell scores across 17 taxonomy cells (5 task types x 5 biological contexts) with average-per-type bar chart.
|
| 23 |
+
- **Component Analysis** -- Radar and grouped bar charts comparing the 6 scoring components (Approach, Orchestration, Quality, Feasibility, Novelty, Diversity) between any two agents.
|
| 24 |
+
- **Benchmark vs User Mode** -- Paired comparison showing how the same LLM performs with minimal prompting (benchmark) vs rich guidance (user mode).
|
| 25 |
+
- **Submit** -- Form to submit your own protein design agent for evaluation.
|
| 26 |
+
- **About** -- Methodology, scoring rubric, submission guide, and citation.
|
| 27 |
|
| 28 |
+
## Run locally
|
| 29 |
|
| 30 |
+
```bash
|
| 31 |
+
pip install -r requirements.txt
|
| 32 |
+
python app.py
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
The app launches a Gradio server at `http://localhost:7860`.
|
| 36 |
+
|
| 37 |
+
## HuggingFace Space deployment
|
| 38 |
+
|
| 39 |
+
This directory is structured as a self-contained HF Space. To deploy:
|
| 40 |
+
|
| 41 |
+
1. Create a new Space on HuggingFace (`sdk: gradio`).
|
| 42 |
+
2. Push the contents of this directory to the Space repo.
|
| 43 |
+
3. Set the `BDB_ADMIN_PASSWORD` secret in the Space settings for admin panel access.
|
| 44 |
+
4. Optionally set `HF_TOKEN` for submission queue access (private dataset).
|
| 45 |
+
|
| 46 |
+
The Space will automatically build and serve the leaderboard.
|
| 47 |
+
|
| 48 |
+
## How to update results
|
| 49 |
+
|
| 50 |
+
Add new entries to `leaderboard_data.json` following the existing schema:
|
| 51 |
+
|
| 52 |
+
```json
|
| 53 |
+
{
|
| 54 |
+
"agent_name": "Your Agent",
|
| 55 |
+
"agent_id": "your-agent-user",
|
| 56 |
+
"mode": "user",
|
| 57 |
+
"mcp_custom": false,
|
| 58 |
+
"submission_type": "llm",
|
| 59 |
+
"organization": "Your Org",
|
| 60 |
+
"overall_score": 42.0,
|
| 61 |
+
"component_scores": {
|
| 62 |
+
"approach": 10.0,
|
| 63 |
+
"orchestration": 8.0,
|
| 64 |
+
"quality": 14.0,
|
| 65 |
+
"feasibility": 6.0,
|
| 66 |
+
"novelty": 2.0,
|
| 67 |
+
"diversity": 2.0
|
| 68 |
+
},
|
| 69 |
+
"taxonomy_scores": {
|
| 70 |
+
"de_novo_binder": {"ab": 45, "enz": 40, "sig": 43},
|
| 71 |
+
"sequence_optimization": {"ab": 50, "enz": 42, "sig": 38, "str": 44, "flu": 52},
|
| 72 |
+
"de_novo_backbone": {"str": 28},
|
| 73 |
+
"complex_engineering": {"enz": 40, "sig": 44, "str": 46},
|
| 74 |
+
"conformational_design": {"enz": 38, "sig": 42, "str": 40, "flu": 44}
|
| 75 |
+
},
|
| 76 |
+
"tasks_completed": 76,
|
| 77 |
+
"tasks_total": 76,
|
| 78 |
+
"tasks_with_zero": 4,
|
| 79 |
+
"avg_latency_sec": 50.0,
|
| 80 |
+
"submission_date": "2026-03-15"
|
| 81 |
+
}
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
Update the `last_updated` field at the top of the JSON file after adding entries.
|
| 85 |
+
|
| 86 |
+
## File overview
|
| 87 |
+
|
| 88 |
+
| File | Description |
|
| 89 |
+
|------|-------------|
|
| 90 |
+
| `app.py` | Main Gradio application with 7 tabs |
|
| 91 |
+
| `leaderboard_data.json` | Current benchmark results |
|
| 92 |
+
| `mcp_tool_schemas.json` | 17 reference MCP tool schemas |
|
| 93 |
+
| `eval_scorer.py` | Self-contained 100-point scoring rubric |
|
| 94 |
+
| `eval_queue.py` | Submission queue (HuggingFace Datasets) |
|
| 95 |
+
| `eval_dispatcher.py` | HTTP task dispatcher for benchmarking |
|
| 96 |
+
| `eval_boltz.py` | Boltz structure prediction post-eval |
|
| 97 |
+
| `eval_tasks.py` | Hidden task loader from HF Dataset |
|
| 98 |
+
| `example_server.py` | Reference FastAPI server for submitters |
|
| 99 |
+
| `requirements.txt` | Python dependencies |
|
app.py
CHANGED
|
@@ -20,7 +20,7 @@ from pathlib import Path
|
|
| 20 |
import gradio as gr
|
| 21 |
import plotly.graph_objects as go
|
| 22 |
|
| 23 |
-
ADMIN_PASSWORD = os.environ.get("BDB_ADMIN_PASSWORD", "
|
| 24 |
|
| 25 |
|
| 26 |
# ═══════════════════════════════════════════════════════════════════
|
|
@@ -28,8 +28,8 @@ ADMIN_PASSWORD = os.environ.get("BDB_ADMIN_PASSWORD", "biodesignbench2026")
|
|
| 28 |
# ═══════════════════════════════════════════════════════════════════
|
| 29 |
|
| 30 |
PAPER_URL = "#"
|
| 31 |
-
GITHUB_URL = "
|
| 32 |
-
HF_URL = "
|
| 33 |
|
| 34 |
|
| 35 |
# ═══════════════════════════════════════════════════════════════════
|
|
|
|
| 20 |
import gradio as gr
|
| 21 |
import plotly.graph_objects as go
|
| 22 |
|
| 23 |
+
ADMIN_PASSWORD = os.environ.get("BDB_ADMIN_PASSWORD", "")
|
| 24 |
|
| 25 |
|
| 26 |
# ═══════════════════════════════════════════════════════════════════
|
|
|
|
| 28 |
# ═══════════════════════════════════════════════════════════════════
|
| 29 |
|
| 30 |
PAPER_URL = "#"
|
| 31 |
+
GITHUB_URL = "https://github.com/biodesignbench/biodesignbench"
|
| 32 |
+
HF_URL = "https://huggingface.co/spaces/biodesignbench/leaderboard"
|
| 33 |
|
| 34 |
|
| 35 |
# ═══════════════════════════════════════════════════════════════════
|
eval_scorer.py
CHANGED
|
@@ -1368,14 +1368,15 @@ def score_diversity(
|
|
| 1368 |
return {"score": 0, "max": max_points, "num_designs": 0, "pairwise_diversity": 0.0, "entropy": 0.0}
|
| 1369 |
|
| 1370 |
num = len(designs)
|
| 1371 |
-
count_fraction = min(num / max_designs, 1.0) if max_designs > 0 else 1.0
|
| 1372 |
diversity = mean_pairwise_diversity(designs)
|
| 1373 |
entropy = sequence_entropy(designs)
|
| 1374 |
|
| 1375 |
-
|
| 1376 |
-
|
| 1377 |
-
|
| 1378 |
-
|
|
|
|
|
|
|
| 1379 |
|
| 1380 |
return {
|
| 1381 |
"score": min(total, max_points), "max": max_points,
|
|
|
|
| 1368 |
return {"score": 0, "max": max_points, "num_designs": 0, "pairwise_diversity": 0.0, "entropy": 0.0}
|
| 1369 |
|
| 1370 |
num = len(designs)
|
|
|
|
| 1371 |
diversity = mean_pairwise_diversity(designs)
|
| 1372 |
entropy = sequence_entropy(designs)
|
| 1373 |
|
| 1374 |
+
# Score based purely on sequence diversity (not design count).
|
| 1375 |
+
# Tasks don't specify how many designs to produce, so counting
|
| 1376 |
+
# would unfairly penalise agents that submit fewer designs.
|
| 1377 |
+
diversity_score = diversity * max_points * 0.65
|
| 1378 |
+
entropy_score = entropy * max_points * 0.35
|
| 1379 |
+
total = int(round(diversity_score + entropy_score))
|
| 1380 |
|
| 1381 |
return {
|
| 1382 |
"score": min(total, max_points), "max": max_points,
|
leaderboard_data.json
CHANGED
|
@@ -1,34 +1,53 @@
|
|
| 1 |
{
|
| 2 |
-
"last_updated": "2026-03-
|
| 3 |
"entries": [
|
| 4 |
{
|
| 5 |
-
"agent_name": "
|
| 6 |
-
"agent_id": "
|
| 7 |
"mode": null,
|
| 8 |
"mcp_custom": false,
|
| 9 |
-
"submission_type": "
|
| 10 |
"organization": "Ground Truth",
|
| 11 |
-
"overall_score":
|
| 12 |
"component_scores": {
|
| 13 |
-
"approach":
|
| 14 |
-
"orchestration":
|
| 15 |
-
"quality":
|
| 16 |
-
"feasibility":
|
| 17 |
-
"novelty":
|
| 18 |
-
"diversity":
|
| 19 |
},
|
| 20 |
"taxonomy_scores": {
|
| 21 |
-
"de_novo_binder": {
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
},
|
| 27 |
"tasks_completed": 76,
|
| 28 |
"tasks_total": 76,
|
| 29 |
"tasks_with_zero": 0,
|
| 30 |
"avg_latency_sec": null,
|
| 31 |
-
"submission_date": "2026-03-
|
| 32 |
},
|
| 33 |
{
|
| 34 |
"agent_name": "Human Expert",
|
|
@@ -36,231 +55,335 @@
|
|
| 36 |
"mode": null,
|
| 37 |
"mcp_custom": false,
|
| 38 |
"submission_type": "human_expert",
|
| 39 |
-
"organization": "
|
| 40 |
-
"overall_score": 62.
|
| 41 |
-
"component_scores": {
|
| 42 |
-
"approach": 14.0,
|
| 43 |
-
"orchestration": 11.0,
|
| 44 |
-
"quality": 20.5,
|
| 45 |
-
"feasibility": 10.5,
|
| 46 |
-
"novelty": 2.5,
|
| 47 |
-
"diversity": 3.5
|
| 48 |
-
},
|
| 49 |
-
"taxonomy_scores": {
|
| 50 |
-
"de_novo_binder": {"ab": 65, "enz": 58, "sig": 63},
|
| 51 |
-
"sequence_optimization": {"ab": 70, "enz": 62, "sig": 55, "str": 64, "flu": 72},
|
| 52 |
-
"de_novo_backbone": {"str": 50},
|
| 53 |
-
"complex_engineering": {"enz": 58, "sig": 62, "str": 66},
|
| 54 |
-
"conformational_design": {"enz": 55, "sig": 60, "str": 58, "flu": 62}
|
| 55 |
-
},
|
| 56 |
-
"tasks_completed": 76,
|
| 57 |
-
"tasks_total": 76,
|
| 58 |
-
"tasks_with_zero": 2,
|
| 59 |
-
"avg_latency_sec": null,
|
| 60 |
-
"submission_date": "2026-03-01"
|
| 61 |
-
},
|
| 62 |
-
{
|
| 63 |
-
"agent_name": "Hardcoded Pipeline",
|
| 64 |
-
"agent_id": "hardcoded-pipeline",
|
| 65 |
-
"mode": null,
|
| 66 |
-
"mcp_custom": false,
|
| 67 |
-
"submission_type": "hardcoded",
|
| 68 |
-
"organization": "Deterministic",
|
| 69 |
-
"overall_score": 41.5,
|
| 70 |
"component_scores": {
|
| 71 |
-
"approach":
|
| 72 |
-
"orchestration": 9.
|
| 73 |
-
"quality": 12.
|
| 74 |
-
"feasibility":
|
| 75 |
-
"novelty":
|
| 76 |
-
"diversity": 2.
|
| 77 |
},
|
| 78 |
"taxonomy_scores": {
|
| 79 |
-
"de_novo_binder": {
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
},
|
| 85 |
"tasks_completed": 76,
|
| 86 |
"tasks_total": 76,
|
| 87 |
-
"tasks_with_zero":
|
| 88 |
"avg_latency_sec": null,
|
| 89 |
-
"submission_date": "2026-03-
|
| 90 |
},
|
| 91 |
{
|
| 92 |
-
"agent_name": "
|
| 93 |
-
"agent_id": "
|
| 94 |
"mode": "user",
|
| 95 |
"mcp_custom": false,
|
| 96 |
"submission_type": "llm",
|
| 97 |
-
"organization": "
|
| 98 |
-
"overall_score":
|
| 99 |
"component_scores": {
|
| 100 |
-
"approach":
|
| 101 |
-
"orchestration":
|
| 102 |
-
"quality":
|
| 103 |
-
"feasibility":
|
| 104 |
-
"novelty":
|
| 105 |
-
"diversity":
|
| 106 |
},
|
| 107 |
"taxonomy_scores": {
|
| 108 |
-
"de_novo_binder": {
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
},
|
| 114 |
"tasks_completed": 76,
|
| 115 |
"tasks_total": 76,
|
| 116 |
-
"tasks_with_zero":
|
| 117 |
-
"avg_latency_sec":
|
| 118 |
-
"submission_date": "2026-03-
|
| 119 |
},
|
| 120 |
{
|
| 121 |
-
"agent_name": "
|
| 122 |
-
"agent_id": "
|
| 123 |
-
"mode":
|
| 124 |
"mcp_custom": false,
|
| 125 |
-
"submission_type": "
|
| 126 |
-
"organization": "
|
| 127 |
-
"overall_score":
|
| 128 |
"component_scores": {
|
| 129 |
-
"approach":
|
| 130 |
-
"orchestration":
|
| 131 |
-
"quality":
|
| 132 |
-
"feasibility":
|
| 133 |
-
"novelty":
|
| 134 |
"diversity": 2.0
|
| 135 |
},
|
| 136 |
"taxonomy_scores": {
|
| 137 |
-
"de_novo_binder": {
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
},
|
| 143 |
"tasks_completed": 76,
|
| 144 |
"tasks_total": 76,
|
| 145 |
-
"tasks_with_zero":
|
| 146 |
-
"avg_latency_sec":
|
| 147 |
-
"submission_date": "2026-03-
|
| 148 |
},
|
| 149 |
{
|
| 150 |
-
"agent_name": "
|
| 151 |
-
"agent_id": "
|
| 152 |
-
"mode": "
|
| 153 |
"mcp_custom": false,
|
| 154 |
"submission_type": "llm",
|
| 155 |
-
"organization": "
|
| 156 |
-
"overall_score":
|
| 157 |
"component_scores": {
|
| 158 |
-
"approach": 7.
|
| 159 |
-
"orchestration":
|
| 160 |
-
"quality":
|
| 161 |
-
"feasibility":
|
| 162 |
-
"novelty":
|
| 163 |
-
"diversity":
|
| 164 |
},
|
| 165 |
"taxonomy_scores": {
|
| 166 |
-
"de_novo_binder": {
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
},
|
| 172 |
"tasks_completed": 76,
|
| 173 |
"tasks_total": 76,
|
| 174 |
-
"tasks_with_zero":
|
| 175 |
-
"avg_latency_sec":
|
| 176 |
-
"submission_date": "2026-03-
|
| 177 |
},
|
| 178 |
{
|
| 179 |
-
"agent_name": "
|
| 180 |
-
"agent_id": "
|
| 181 |
"mode": "user",
|
| 182 |
"mcp_custom": false,
|
| 183 |
"submission_type": "llm",
|
| 184 |
-
"organization": "
|
| 185 |
-
"overall_score":
|
| 186 |
"component_scores": {
|
| 187 |
-
"approach":
|
| 188 |
-
"orchestration":
|
| 189 |
-
"quality":
|
| 190 |
-
"feasibility":
|
| 191 |
-
"novelty":
|
| 192 |
-
"diversity":
|
| 193 |
},
|
| 194 |
"taxonomy_scores": {
|
| 195 |
-
"de_novo_binder": {
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
},
|
| 201 |
"tasks_completed": 76,
|
| 202 |
"tasks_total": 76,
|
| 203 |
-
"tasks_with_zero":
|
| 204 |
-
"avg_latency_sec":
|
| 205 |
-
"submission_date": "2026-03-
|
| 206 |
},
|
| 207 |
{
|
| 208 |
-
"agent_name": "
|
| 209 |
-
"agent_id": "
|
| 210 |
"mode": "user",
|
| 211 |
"mcp_custom": false,
|
| 212 |
"submission_type": "llm",
|
| 213 |
-
"organization": "
|
| 214 |
-
"overall_score":
|
| 215 |
"component_scores": {
|
| 216 |
-
"approach":
|
| 217 |
-
"orchestration":
|
| 218 |
-
"quality":
|
| 219 |
-
"feasibility":
|
| 220 |
-
"novelty":
|
| 221 |
-
"diversity":
|
| 222 |
},
|
| 223 |
"taxonomy_scores": {
|
| 224 |
-
"de_novo_binder": {
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
},
|
| 230 |
"tasks_completed": 76,
|
| 231 |
"tasks_total": 76,
|
| 232 |
-
"tasks_with_zero":
|
| 233 |
-
"avg_latency_sec":
|
| 234 |
-
"submission_date": "2026-03-
|
| 235 |
},
|
| 236 |
{
|
| 237 |
-
"agent_name": "Claude
|
| 238 |
-
"agent_id": "
|
| 239 |
"mode": "benchmark",
|
| 240 |
"mcp_custom": false,
|
| 241 |
"submission_type": "llm",
|
| 242 |
"organization": "Anthropic",
|
| 243 |
-
"overall_score":
|
| 244 |
"component_scores": {
|
| 245 |
-
"approach":
|
| 246 |
-
"orchestration":
|
| 247 |
-
"quality":
|
| 248 |
-
"feasibility":
|
| 249 |
-
"novelty":
|
| 250 |
-
"diversity": 1.
|
| 251 |
},
|
| 252 |
"taxonomy_scores": {
|
| 253 |
-
"de_novo_binder": {
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 258 |
},
|
| 259 |
"tasks_completed": 76,
|
| 260 |
"tasks_total": 76,
|
| 261 |
-
"tasks_with_zero":
|
| 262 |
-
"avg_latency_sec":
|
| 263 |
-
"submission_date": "2026-03-
|
| 264 |
},
|
| 265 |
{
|
| 266 |
"agent_name": "GPT-5",
|
|
@@ -269,114 +392,142 @@
|
|
| 269 |
"mcp_custom": false,
|
| 270 |
"submission_type": "llm",
|
| 271 |
"organization": "OpenAI",
|
| 272 |
-
"overall_score":
|
| 273 |
"component_scores": {
|
| 274 |
"approach": 5.2,
|
| 275 |
-
"orchestration":
|
| 276 |
-
"quality":
|
| 277 |
-
"feasibility":
|
| 278 |
-
"novelty":
|
| 279 |
-
"diversity":
|
| 280 |
-
},
|
| 281 |
-
"taxonomy_scores": {
|
| 282 |
-
"de_novo_binder": {"ab": 20, "enz": 16, "sig": 19},
|
| 283 |
-
"sequence_optimization": {"ab": 23, "enz": 18, "sig": 14, "str": 19, "flu": 26},
|
| 284 |
-
"de_novo_backbone": {"str": 10},
|
| 285 |
-
"complex_engineering": {"enz": 16, "sig": 18, "str": 20},
|
| 286 |
-
"conformational_design": {"enz": 14, "sig": 17, "str": 16, "flu": 18}
|
| 287 |
-
},
|
| 288 |
-
"tasks_completed": 76,
|
| 289 |
-
"tasks_total": 76,
|
| 290 |
-
"tasks_with_zero": 16,
|
| 291 |
-
"avg_latency_sec": 42.0,
|
| 292 |
-
"submission_date": "2026-03-01"
|
| 293 |
-
},
|
| 294 |
-
{
|
| 295 |
-
"agent_name": "Deepseek-v3.2",
|
| 296 |
-
"agent_id": "deepseek32-benchmark",
|
| 297 |
-
"mode": "benchmark",
|
| 298 |
-
"mcp_custom": false,
|
| 299 |
-
"submission_type": "llm",
|
| 300 |
-
"organization": "Deepseek",
|
| 301 |
-
"overall_score": 16.0,
|
| 302 |
-
"component_scores": {
|
| 303 |
-
"approach": 4.5,
|
| 304 |
-
"orchestration": 2.8,
|
| 305 |
-
"quality": 5.0,
|
| 306 |
-
"feasibility": 2.2,
|
| 307 |
-
"novelty": 0.7,
|
| 308 |
-
"diversity": 0.8
|
| 309 |
},
|
| 310 |
"taxonomy_scores": {
|
| 311 |
-
"de_novo_binder": {
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 316 |
},
|
| 317 |
"tasks_completed": 76,
|
| 318 |
"tasks_total": 76,
|
| 319 |
-
"tasks_with_zero":
|
| 320 |
-
"avg_latency_sec":
|
| 321 |
-
"submission_date": "2026-03-
|
| 322 |
},
|
| 323 |
{
|
| 324 |
-
"agent_name": "Gemini
|
| 325 |
-
"agent_id": "
|
| 326 |
-
"mode": "
|
| 327 |
"mcp_custom": false,
|
| 328 |
"submission_type": "llm",
|
| 329 |
"organization": "Google",
|
| 330 |
-
"overall_score":
|
| 331 |
"component_scores": {
|
| 332 |
-
"approach":
|
| 333 |
-
"orchestration":
|
| 334 |
-
"quality":
|
| 335 |
-
"feasibility":
|
| 336 |
-
"novelty":
|
| 337 |
-
"diversity": 1.
|
| 338 |
},
|
| 339 |
"taxonomy_scores": {
|
| 340 |
-
"de_novo_binder": {
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 345 |
},
|
| 346 |
"tasks_completed": 76,
|
| 347 |
"tasks_total": 76,
|
| 348 |
-
"tasks_with_zero":
|
| 349 |
-
"avg_latency_sec":
|
| 350 |
-
"submission_date": "2026-03-
|
| 351 |
},
|
| 352 |
{
|
| 353 |
-
"agent_name": "
|
| 354 |
-
"agent_id": "
|
| 355 |
"mode": "benchmark",
|
| 356 |
"mcp_custom": false,
|
| 357 |
"submission_type": "llm",
|
| 358 |
-
"organization": "
|
| 359 |
-
"overall_score":
|
| 360 |
"component_scores": {
|
| 361 |
-
"approach":
|
| 362 |
-
"orchestration":
|
| 363 |
-
"quality":
|
| 364 |
-
"feasibility":
|
| 365 |
-
"novelty":
|
| 366 |
-
"diversity": 1.
|
| 367 |
},
|
| 368 |
"taxonomy_scores": {
|
| 369 |
-
"de_novo_binder": {
|
| 370 |
-
|
| 371 |
-
|
| 372 |
-
|
| 373 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 374 |
},
|
| 375 |
"tasks_completed": 76,
|
| 376 |
"tasks_total": 76,
|
| 377 |
-
"tasks_with_zero":
|
| 378 |
-
"avg_latency_sec":
|
| 379 |
-
"submission_date": "2026-03-
|
| 380 |
}
|
| 381 |
]
|
| 382 |
-
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"last_updated": "2026-03-10",
|
| 3 |
"entries": [
|
| 4 |
{
|
| 5 |
+
"agent_name": "Oracle",
|
| 6 |
+
"agent_id": "oracle",
|
| 7 |
"mode": null,
|
| 8 |
"mcp_custom": false,
|
| 9 |
+
"submission_type": "oracle",
|
| 10 |
"organization": "Ground Truth",
|
| 11 |
+
"overall_score": 87.3,
|
| 12 |
"component_scores": {
|
| 13 |
+
"approach": 20.0,
|
| 14 |
+
"orchestration": 15.0,
|
| 15 |
+
"quality": 22.3,
|
| 16 |
+
"feasibility": 15.0,
|
| 17 |
+
"novelty": 5.0,
|
| 18 |
+
"diversity": 10.0
|
| 19 |
},
|
| 20 |
"taxonomy_scores": {
|
| 21 |
+
"de_novo_binder": {
|
| 22 |
+
"ab": 74.0,
|
| 23 |
+
"bnd": 82.0,
|
| 24 |
+
"scf": 92.0
|
| 25 |
+
},
|
| 26 |
+
"conformational_design": {
|
| 27 |
+
"enz": 92.0,
|
| 28 |
+
"fp": 96.0,
|
| 29 |
+
"scf": 81.0
|
| 30 |
+
},
|
| 31 |
+
"complex_engineering": {
|
| 32 |
+
"enz": 75.0,
|
| 33 |
+
"bnd": 84.0,
|
| 34 |
+
"scf": 78.0
|
| 35 |
+
},
|
| 36 |
+
"de_novo_backbone": {
|
| 37 |
+
"scf": 98.0
|
| 38 |
+
},
|
| 39 |
+
"sequence_optimization": {
|
| 40 |
+
"enz": 99.0,
|
| 41 |
+
"fp": 97.0,
|
| 42 |
+
"ab": 98.0,
|
| 43 |
+
"scf": 98.0
|
| 44 |
+
}
|
| 45 |
},
|
| 46 |
"tasks_completed": 76,
|
| 47 |
"tasks_total": 76,
|
| 48 |
"tasks_with_zero": 0,
|
| 49 |
"avg_latency_sec": null,
|
| 50 |
+
"submission_date": "2026-03-10"
|
| 51 |
},
|
| 52 |
{
|
| 53 |
"agent_name": "Human Expert",
|
|
|
|
| 55 |
"mode": null,
|
| 56 |
"mcp_custom": false,
|
| 57 |
"submission_type": "human_expert",
|
| 58 |
+
"organization": "Romero Lab",
|
| 59 |
+
"overall_score": 62.4,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
"component_scores": {
|
| 61 |
+
"approach": 19.0,
|
| 62 |
+
"orchestration": 9.9,
|
| 63 |
+
"quality": 12.9,
|
| 64 |
+
"feasibility": 13.6,
|
| 65 |
+
"novelty": 4.5,
|
| 66 |
+
"diversity": 2.6
|
| 67 |
},
|
| 68 |
"taxonomy_scores": {
|
| 69 |
+
"de_novo_binder": {
|
| 70 |
+
"ab": 57.0,
|
| 71 |
+
"bnd": 71.0,
|
| 72 |
+
"scf": 70.0
|
| 73 |
+
},
|
| 74 |
+
"conformational_design": {
|
| 75 |
+
"enz": 68.0,
|
| 76 |
+
"fp": 59.0,
|
| 77 |
+
"scf": 50.0
|
| 78 |
+
},
|
| 79 |
+
"complex_engineering": {
|
| 80 |
+
"enz": 40.0,
|
| 81 |
+
"bnd": 76.0,
|
| 82 |
+
"scf": 67.0
|
| 83 |
+
},
|
| 84 |
+
"de_novo_backbone": {
|
| 85 |
+
"scf": 84.0
|
| 86 |
+
},
|
| 87 |
+
"sequence_optimization": {
|
| 88 |
+
"enz": 48.0,
|
| 89 |
+
"fp": 51.0,
|
| 90 |
+
"ab": 65.0,
|
| 91 |
+
"scf": 54.0
|
| 92 |
+
}
|
| 93 |
},
|
| 94 |
"tasks_completed": 76,
|
| 95 |
"tasks_total": 76,
|
| 96 |
+
"tasks_with_zero": 0,
|
| 97 |
"avg_latency_sec": null,
|
| 98 |
+
"submission_date": "2026-03-10"
|
| 99 |
},
|
| 100 |
{
|
| 101 |
+
"agent_name": "DeepSeek V3",
|
| 102 |
+
"agent_id": "deepseek-v3-user",
|
| 103 |
"mode": "user",
|
| 104 |
"mcp_custom": false,
|
| 105 |
"submission_type": "llm",
|
| 106 |
+
"organization": "DeepSeek",
|
| 107 |
+
"overall_score": 58.4,
|
| 108 |
"component_scores": {
|
| 109 |
+
"approach": 12.8,
|
| 110 |
+
"orchestration": 10.0,
|
| 111 |
+
"quality": 15.6,
|
| 112 |
+
"feasibility": 12.2,
|
| 113 |
+
"novelty": 4.3,
|
| 114 |
+
"diversity": 3.4
|
| 115 |
},
|
| 116 |
"taxonomy_scores": {
|
| 117 |
+
"de_novo_binder": {
|
| 118 |
+
"ab": 55.0,
|
| 119 |
+
"bnd": 63.0,
|
| 120 |
+
"scf": 56.0
|
| 121 |
+
},
|
| 122 |
+
"conformational_design": {
|
| 123 |
+
"enz": 48.0,
|
| 124 |
+
"fp": 56.0,
|
| 125 |
+
"scf": 54.0
|
| 126 |
+
},
|
| 127 |
+
"complex_engineering": {
|
| 128 |
+
"enz": 56.0,
|
| 129 |
+
"bnd": 66.0,
|
| 130 |
+
"scf": 60.0
|
| 131 |
+
},
|
| 132 |
+
"de_novo_backbone": {
|
| 133 |
+
"scf": 37.0
|
| 134 |
+
},
|
| 135 |
+
"sequence_optimization": {
|
| 136 |
+
"enz": 61.0,
|
| 137 |
+
"fp": 66.0,
|
| 138 |
+
"ab": 83.0,
|
| 139 |
+
"scf": 62.0
|
| 140 |
+
}
|
| 141 |
},
|
| 142 |
"tasks_completed": 76,
|
| 143 |
"tasks_total": 76,
|
| 144 |
+
"tasks_with_zero": 1,
|
| 145 |
+
"avg_latency_sec": null,
|
| 146 |
+
"submission_date": "2026-03-10"
|
| 147 |
},
|
| 148 |
{
|
| 149 |
+
"agent_name": "Hardcoded Pipeline",
|
| 150 |
+
"agent_id": "hardcoded-pipeline",
|
| 151 |
+
"mode": null,
|
| 152 |
"mcp_custom": false,
|
| 153 |
+
"submission_type": "hardcoded",
|
| 154 |
+
"organization": "Deterministic",
|
| 155 |
+
"overall_score": 52.4,
|
| 156 |
"component_scores": {
|
| 157 |
+
"approach": 12.1,
|
| 158 |
+
"orchestration": 9.9,
|
| 159 |
+
"quality": 14.8,
|
| 160 |
+
"feasibility": 9.7,
|
| 161 |
+
"novelty": 3.8,
|
| 162 |
"diversity": 2.0
|
| 163 |
},
|
| 164 |
"taxonomy_scores": {
|
| 165 |
+
"de_novo_binder": {
|
| 166 |
+
"ab": 45.0,
|
| 167 |
+
"bnd": 56.0,
|
| 168 |
+
"scf": 67.0
|
| 169 |
+
},
|
| 170 |
+
"conformational_design": {
|
| 171 |
+
"enz": 38.0,
|
| 172 |
+
"fp": 27.0,
|
| 173 |
+
"scf": 35.0
|
| 174 |
+
},
|
| 175 |
+
"complex_engineering": {
|
| 176 |
+
"enz": 57.0,
|
| 177 |
+
"bnd": 64.0,
|
| 178 |
+
"scf": 64.0
|
| 179 |
+
},
|
| 180 |
+
"de_novo_backbone": {
|
| 181 |
+
"scf": 11.0
|
| 182 |
+
},
|
| 183 |
+
"sequence_optimization": {
|
| 184 |
+
"enz": 70.0,
|
| 185 |
+
"fp": 67.0,
|
| 186 |
+
"ab": 57.0,
|
| 187 |
+
"scf": 75.0
|
| 188 |
+
}
|
| 189 |
},
|
| 190 |
"tasks_completed": 76,
|
| 191 |
"tasks_total": 76,
|
| 192 |
+
"tasks_with_zero": 5,
|
| 193 |
+
"avg_latency_sec": null,
|
| 194 |
+
"submission_date": "2026-03-10"
|
| 195 |
},
|
| 196 |
{
|
| 197 |
+
"agent_name": "DeepSeek V3",
|
| 198 |
+
"agent_id": "deepseek-v3-benchmark",
|
| 199 |
+
"mode": "benchmark",
|
| 200 |
"mcp_custom": false,
|
| 201 |
"submission_type": "llm",
|
| 202 |
+
"organization": "DeepSeek",
|
| 203 |
+
"overall_score": 50.5,
|
| 204 |
"component_scores": {
|
| 205 |
+
"approach": 7.1,
|
| 206 |
+
"orchestration": 7.2,
|
| 207 |
+
"quality": 16.1,
|
| 208 |
+
"feasibility": 13.2,
|
| 209 |
+
"novelty": 4.1,
|
| 210 |
+
"diversity": 3.0
|
| 211 |
},
|
| 212 |
"taxonomy_scores": {
|
| 213 |
+
"de_novo_binder": {
|
| 214 |
+
"ab": 46.0,
|
| 215 |
+
"bnd": 53.0,
|
| 216 |
+
"scf": 47.0
|
| 217 |
+
},
|
| 218 |
+
"conformational_design": {
|
| 219 |
+
"enz": 44.0,
|
| 220 |
+
"fp": 62.0,
|
| 221 |
+
"scf": 38.0
|
| 222 |
+
},
|
| 223 |
+
"complex_engineering": {
|
| 224 |
+
"enz": 33.0,
|
| 225 |
+
"bnd": 56.0,
|
| 226 |
+
"scf": 52.0
|
| 227 |
+
},
|
| 228 |
+
"de_novo_backbone": {
|
| 229 |
+
"scf": 54.0
|
| 230 |
+
},
|
| 231 |
+
"sequence_optimization": {
|
| 232 |
+
"enz": 55.0,
|
| 233 |
+
"fp": 41.0,
|
| 234 |
+
"ab": 69.0,
|
| 235 |
+
"scf": 72.0
|
| 236 |
+
}
|
| 237 |
},
|
| 238 |
"tasks_completed": 76,
|
| 239 |
"tasks_total": 76,
|
| 240 |
+
"tasks_with_zero": 2,
|
| 241 |
+
"avg_latency_sec": null,
|
| 242 |
+
"submission_date": "2026-03-10"
|
| 243 |
},
|
| 244 |
{
|
| 245 |
+
"agent_name": "GPT-5",
|
| 246 |
+
"agent_id": "gpt5-user",
|
| 247 |
"mode": "user",
|
| 248 |
"mcp_custom": false,
|
| 249 |
"submission_type": "llm",
|
| 250 |
+
"organization": "OpenAI",
|
| 251 |
+
"overall_score": 49.2,
|
| 252 |
"component_scores": {
|
| 253 |
+
"approach": 7.9,
|
| 254 |
+
"orchestration": 7.6,
|
| 255 |
+
"quality": 15.3,
|
| 256 |
+
"feasibility": 11.1,
|
| 257 |
+
"novelty": 4.1,
|
| 258 |
+
"diversity": 3.1
|
| 259 |
},
|
| 260 |
"taxonomy_scores": {
|
| 261 |
+
"de_novo_binder": {
|
| 262 |
+
"ab": 43.0,
|
| 263 |
+
"bnd": 55.0,
|
| 264 |
+
"scf": 54.0
|
| 265 |
+
},
|
| 266 |
+
"conformational_design": {
|
| 267 |
+
"enz": 32.0,
|
| 268 |
+
"fp": 40.0,
|
| 269 |
+
"scf": 39.0
|
| 270 |
+
},
|
| 271 |
+
"complex_engineering": {
|
| 272 |
+
"enz": 43.0,
|
| 273 |
+
"bnd": 57.0,
|
| 274 |
+
"scf": 53.0
|
| 275 |
+
},
|
| 276 |
+
"de_novo_backbone": {
|
| 277 |
+
"scf": 45.0
|
| 278 |
+
},
|
| 279 |
+
"sequence_optimization": {
|
| 280 |
+
"enz": 48.0,
|
| 281 |
+
"fp": 52.0,
|
| 282 |
+
"ab": 71.0,
|
| 283 |
+
"scf": 62.0
|
| 284 |
+
}
|
| 285 |
},
|
| 286 |
"tasks_completed": 76,
|
| 287 |
"tasks_total": 76,
|
| 288 |
+
"tasks_with_zero": 3,
|
| 289 |
+
"avg_latency_sec": null,
|
| 290 |
+
"submission_date": "2026-03-10"
|
| 291 |
},
|
| 292 |
{
|
| 293 |
+
"agent_name": "Claude Sonnet 4.5",
|
| 294 |
+
"agent_id": "sonnet-4.5-user",
|
| 295 |
"mode": "user",
|
| 296 |
"mcp_custom": false,
|
| 297 |
"submission_type": "llm",
|
| 298 |
+
"organization": "Anthropic",
|
| 299 |
+
"overall_score": 47.9,
|
| 300 |
"component_scores": {
|
| 301 |
+
"approach": 8.6,
|
| 302 |
+
"orchestration": 7.8,
|
| 303 |
+
"quality": 15.0,
|
| 304 |
+
"feasibility": 10.9,
|
| 305 |
+
"novelty": 3.4,
|
| 306 |
+
"diversity": 2.2
|
| 307 |
},
|
| 308 |
"taxonomy_scores": {
|
| 309 |
+
"de_novo_binder": {
|
| 310 |
+
"ab": 42.0,
|
| 311 |
+
"bnd": 53.0,
|
| 312 |
+
"scf": 38.0
|
| 313 |
+
},
|
| 314 |
+
"conformational_design": {
|
| 315 |
+
"enz": 42.0,
|
| 316 |
+
"fp": 47.0,
|
| 317 |
+
"scf": 35.0
|
| 318 |
+
},
|
| 319 |
+
"complex_engineering": {
|
| 320 |
+
"enz": 48.0,
|
| 321 |
+
"bnd": 66.0,
|
| 322 |
+
"scf": 53.0
|
| 323 |
+
},
|
| 324 |
+
"de_novo_backbone": {
|
| 325 |
+
"scf": 33.0
|
| 326 |
+
},
|
| 327 |
+
"sequence_optimization": {
|
| 328 |
+
"enz": 48.0,
|
| 329 |
+
"fp": 60.0,
|
| 330 |
+
"ab": 67.0,
|
| 331 |
+
"scf": 18.0
|
| 332 |
+
}
|
| 333 |
},
|
| 334 |
"tasks_completed": 76,
|
| 335 |
"tasks_total": 76,
|
| 336 |
+
"tasks_with_zero": 6,
|
| 337 |
+
"avg_latency_sec": null,
|
| 338 |
+
"submission_date": "2026-03-10"
|
| 339 |
},
|
| 340 |
{
|
| 341 |
+
"agent_name": "Claude Sonnet 4.5",
|
| 342 |
+
"agent_id": "sonnet-4.5-benchmark",
|
| 343 |
"mode": "benchmark",
|
| 344 |
"mcp_custom": false,
|
| 345 |
"submission_type": "llm",
|
| 346 |
"organization": "Anthropic",
|
| 347 |
+
"overall_score": 42.3,
|
| 348 |
"component_scores": {
|
| 349 |
+
"approach": 6.0,
|
| 350 |
+
"orchestration": 6.2,
|
| 351 |
+
"quality": 13.8,
|
| 352 |
+
"feasibility": 11.4,
|
| 353 |
+
"novelty": 3.2,
|
| 354 |
+
"diversity": 1.7
|
| 355 |
},
|
| 356 |
"taxonomy_scores": {
|
| 357 |
+
"de_novo_binder": {
|
| 358 |
+
"ab": 32.0,
|
| 359 |
+
"bnd": 44.0,
|
| 360 |
+
"scf": 36.0
|
| 361 |
+
},
|
| 362 |
+
"conformational_design": {
|
| 363 |
+
"enz": 17.0,
|
| 364 |
+
"fp": 56.0,
|
| 365 |
+
"scf": 41.0
|
| 366 |
+
},
|
| 367 |
+
"complex_engineering": {
|
| 368 |
+
"enz": 44.0,
|
| 369 |
+
"bnd": 55.0,
|
| 370 |
+
"scf": 37.0
|
| 371 |
+
},
|
| 372 |
+
"de_novo_backbone": {
|
| 373 |
+
"scf": 44.0
|
| 374 |
+
},
|
| 375 |
+
"sequence_optimization": {
|
| 376 |
+
"enz": 40.0,
|
| 377 |
+
"fp": 51.0,
|
| 378 |
+
"ab": 58.0,
|
| 379 |
+
"scf": 20.0
|
| 380 |
+
}
|
| 381 |
},
|
| 382 |
"tasks_completed": 76,
|
| 383 |
"tasks_total": 76,
|
| 384 |
+
"tasks_with_zero": 9,
|
| 385 |
+
"avg_latency_sec": null,
|
| 386 |
+
"submission_date": "2026-03-10"
|
| 387 |
},
|
| 388 |
{
|
| 389 |
"agent_name": "GPT-5",
|
|
|
|
| 392 |
"mcp_custom": false,
|
| 393 |
"submission_type": "llm",
|
| 394 |
"organization": "OpenAI",
|
| 395 |
+
"overall_score": 41.0,
|
| 396 |
"component_scores": {
|
| 397 |
"approach": 5.2,
|
| 398 |
+
"orchestration": 4.9,
|
| 399 |
+
"quality": 15.0,
|
| 400 |
+
"feasibility": 11.5,
|
| 401 |
+
"novelty": 3.5,
|
| 402 |
+
"diversity": 0.9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 403 |
},
|
| 404 |
"taxonomy_scores": {
|
| 405 |
+
"de_novo_binder": {
|
| 406 |
+
"ab": 32.0,
|
| 407 |
+
"bnd": 41.0,
|
| 408 |
+
"scf": 45.0
|
| 409 |
+
},
|
| 410 |
+
"conformational_design": {
|
| 411 |
+
"enz": 22.0,
|
| 412 |
+
"fp": 55.0,
|
| 413 |
+
"scf": 40.0
|
| 414 |
+
},
|
| 415 |
+
"complex_engineering": {
|
| 416 |
+
"enz": 3.0,
|
| 417 |
+
"bnd": 49.0,
|
| 418 |
+
"scf": 26.0
|
| 419 |
+
},
|
| 420 |
+
"de_novo_backbone": {
|
| 421 |
+
"scf": 45.0
|
| 422 |
+
},
|
| 423 |
+
"sequence_optimization": {
|
| 424 |
+
"enz": 44.0,
|
| 425 |
+
"fp": 52.0,
|
| 426 |
+
"ab": 52.0,
|
| 427 |
+
"scf": 49.0
|
| 428 |
+
}
|
| 429 |
},
|
| 430 |
"tasks_completed": 76,
|
| 431 |
"tasks_total": 76,
|
| 432 |
+
"tasks_with_zero": 5,
|
| 433 |
+
"avg_latency_sec": null,
|
| 434 |
+
"submission_date": "2026-03-10"
|
| 435 |
},
|
| 436 |
{
|
| 437 |
+
"agent_name": "Gemini 2.5 Pro",
|
| 438 |
+
"agent_id": "gemini-2.5-pro-user",
|
| 439 |
+
"mode": "user",
|
| 440 |
"mcp_custom": false,
|
| 441 |
"submission_type": "llm",
|
| 442 |
"organization": "Google",
|
| 443 |
+
"overall_score": 26.2,
|
| 444 |
"component_scores": {
|
| 445 |
+
"approach": 0.0,
|
| 446 |
+
"orchestration": 0.0,
|
| 447 |
+
"quality": 10.3,
|
| 448 |
+
"feasibility": 10.9,
|
| 449 |
+
"novelty": 3.5,
|
| 450 |
+
"diversity": 1.5
|
| 451 |
},
|
| 452 |
"taxonomy_scores": {
|
| 453 |
+
"de_novo_binder": {
|
| 454 |
+
"ab": 22.0,
|
| 455 |
+
"bnd": 36.0,
|
| 456 |
+
"scf": 28.0
|
| 457 |
+
},
|
| 458 |
+
"conformational_design": {
|
| 459 |
+
"enz": 8.0,
|
| 460 |
+
"fp": 9.0,
|
| 461 |
+
"scf": 10.0
|
| 462 |
+
},
|
| 463 |
+
"complex_engineering": {
|
| 464 |
+
"enz": 12.0,
|
| 465 |
+
"bnd": 35.0,
|
| 466 |
+
"scf": 22.0
|
| 467 |
+
},
|
| 468 |
+
"de_novo_backbone": {
|
| 469 |
+
"scf": 21.0
|
| 470 |
+
},
|
| 471 |
+
"sequence_optimization": {
|
| 472 |
+
"enz": 33.0,
|
| 473 |
+
"fp": 36.0,
|
| 474 |
+
"ab": 53.0,
|
| 475 |
+
"scf": 22.0
|
| 476 |
+
}
|
| 477 |
},
|
| 478 |
"tasks_completed": 76,
|
| 479 |
"tasks_total": 76,
|
| 480 |
+
"tasks_with_zero": 15,
|
| 481 |
+
"avg_latency_sec": null,
|
| 482 |
+
"submission_date": "2026-03-10"
|
| 483 |
},
|
| 484 |
{
|
| 485 |
+
"agent_name": "Gemini 2.5 Pro",
|
| 486 |
+
"agent_id": "gemini-2.5-pro-benchmark",
|
| 487 |
"mode": "benchmark",
|
| 488 |
"mcp_custom": false,
|
| 489 |
"submission_type": "llm",
|
| 490 |
+
"organization": "Google",
|
| 491 |
+
"overall_score": 25.8,
|
| 492 |
"component_scores": {
|
| 493 |
+
"approach": 0.0,
|
| 494 |
+
"orchestration": 0.0,
|
| 495 |
+
"quality": 10.1,
|
| 496 |
+
"feasibility": 10.7,
|
| 497 |
+
"novelty": 3.4,
|
| 498 |
+
"diversity": 1.6
|
| 499 |
},
|
| 500 |
"taxonomy_scores": {
|
| 501 |
+
"de_novo_binder": {
|
| 502 |
+
"ab": 28.0,
|
| 503 |
+
"bnd": 35.0,
|
| 504 |
+
"scf": 20.0
|
| 505 |
+
},
|
| 506 |
+
"conformational_design": {
|
| 507 |
+
"enz": 16.0,
|
| 508 |
+
"fp": 22.0,
|
| 509 |
+
"scf": 6.0
|
| 510 |
+
},
|
| 511 |
+
"complex_engineering": {
|
| 512 |
+
"enz": 0.0,
|
| 513 |
+
"bnd": 32.0,
|
| 514 |
+
"scf": 27.0
|
| 515 |
+
},
|
| 516 |
+
"de_novo_backbone": {
|
| 517 |
+
"scf": 21.0
|
| 518 |
+
},
|
| 519 |
+
"sequence_optimization": {
|
| 520 |
+
"enz": 30.0,
|
| 521 |
+
"fp": 33.0,
|
| 522 |
+
"ab": 52.0,
|
| 523 |
+
"scf": 15.0
|
| 524 |
+
}
|
| 525 |
},
|
| 526 |
"tasks_completed": 76,
|
| 527 |
"tasks_total": 76,
|
| 528 |
+
"tasks_with_zero": 17,
|
| 529 |
+
"avg_latency_sec": null,
|
| 530 |
+
"submission_date": "2026-03-10"
|
| 531 |
}
|
| 532 |
]
|
| 533 |
+
}
|