NanoAgent-GGKE v4: Graph-Guided Knowledge Evolution for Nanobody Design

Overview

NanoAgent-GGKE is a computational nanobody design framework that uses a Knowledge Graph (KG) to actively guide the design process across multiple targets. The key innovation is that the KG accumulates cross-target design knowledge and actively steers mutations, CDR composition, and strategy selection at every design step.

Key Innovation

KG Active Guidance: Unlike passive approaches, our KG directly guides scaffold mutations, strategy parameters, failure avoidance, and CDR composition
Cross-Target Knowledge Transfer: Design knowledge from one antigen transfers to improve designs for new, unseen antigens
Verifiable Design Quality: Silver Standard retrospective validation against experimentally-validated SAbDab nanobodies

Results Summary

Silver Standard Retrospective Validation (30 SAbDab pairs)

Method	Recovery Score	CDR3 Comp Cosine	CDR3 Prop Cosine	Length Match
Random Baseline	42.2	0.347	0.794	53.3%
Template Baseline	40.1	0.498	0.860	3.3%
v4 No-KG	41.3	0.506	0.851	13.3%
v4 Full (KG+FB)	47.5	0.492	0.821	53.3%

v4_full vs random: +5.3 | v4_full vs no_kg: +6.2 (KG value)

Full Experiments (Nature Score, 0-100)

Experiment	Mean	±Std	Grades
E1 Baseline (no KG, no FB)	62.6	8.1	B:5, C:5
E2 KG Only	57.7	2.3	C:7, B:3
E3 Full v4 (KG+FB)	68.4	1.0	B:10
E4 Scaling (30 targets)	68.5	0.8	B:30
E5 Cross-Transfer	68.7	0.6	B:10
E7 Feedback Only	70.2	2.6	B:10
E9 Large Scale (50)	68.5	0.8	B:25

Key Comparisons

E1→E3 (full system): +5.8
FB value (E2→E3): +10.7
Learning Curve: Batch 1→5 converges by batch 2 (+0.8)

Architecture

┌─────────────────────────────────────────────────┐
│  NanoAgent-GGKE v4 Pipeline                     │
├─────────────────────────────────────────────────┤
│  1. Target Analysis (PDB fetch + ESM embedding) │
│  2. KG-Guided Scaffold Mutation                 │
│  3. CDR Adaptation (ProteinMPNN + ESM scoring)  │
│  4. Greedy Feedback Loop (4 channels)           │
│  5. Composite Scoring (Nature Score)            │
│  6. KG Update (accumulate knowledge)            │
└─────────────────────────────────────────────────┘

KG Active Guidance (4 channels):
  ├── guide_scaffold_mutations()    → position + AA suggestions
  ├── guide_strategy_params()       → CDR length, composition bias
  ├── guide_failure_avoidance()     → avoid known bad patterns
  └── guide_cdr_composition()       → charge/hydrophobicity targets

Scoring

Nature Score = Structure(40%) + Sequence(30%) + Developability(30%)

Structure: ESMFold pLDDT + pTM
Sequence: ESM-1b pseudo-perplexity + CDR diversity
Developability: CamSol solubility + charge balance + no aggregation motifs

Project Structure

├── code/                     # Main experiment scripts
│   ├── run_v4.py            # Core pipeline (VHH scaffold, CDR adapt, feedback)
│   ├── run_v4_full.py       # Full experiment suite (E1-E10 + S1)
│   └── run_v4_retrospective.py  # Silver Standard validation
├── src/                      # Module source code
│   └── virtual_lab/
│       ├── harness/composite_v3d.py    # Nature Score computation
│       ├── knowledge_graph/nanokg_v4.py # KG with active guidance
│       └── skills/                      # ESM, ESMFold, ProteinMPNN, etc.
├── data/                     # Datasets
│   └── retrospective_test_set.json  # 50 SAbDab nanobody-antigen pairs
├── results/                  # Experiment results
│   ├── retrospective_summary.json
│   └── master_summary_condensed.json
└── deploy/                   # Deployment tools
    ├── deploy.sh            # One-click GPU setup script
    ├── pip_requirements.txt
    └── nanoagent_v4_complete_deploy.tar.gz

Quick Start (New GPU)

# 1. Upload and extract
tar xzf deploy/nanoagent_v4_complete_deploy.tar.gz
cd nanoagent && bash deploy.sh

# 2. Run experiments
python3 run_v4.py                    # Fast validation (3 targets, ~5min)
python3 run_v4_full.py               # Full suite (E1-E10, ~35min)
python3 run_v4_retrospective.py      # Silver Standard (~30min)

Requirements

GPU: NVIDIA with ≥16GB VRAM (tested on RTX 4090)
CUDA 12.x
Python 3.10+
PyTorch 2.6+
ESM, ESMFold, ProteinMPNN (auto-installed by deploy.sh)

Date

Experiments run: 2026-05-01
Total compute: ~70 min on single GPU

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support