NanoAgent-GGKE v4: Graph-Guided Knowledge Evolution for Nanobody Design

Overview

NanoAgent-GGKE is a computational nanobody design framework that uses a Knowledge Graph (KG) to actively guide the design process across multiple targets. The key innovation is that the KG accumulates cross-target design knowledge and actively steers mutations, CDR composition, and strategy selection at every design step.

Key Innovation

  • KG Active Guidance: Unlike passive approaches, our KG directly guides scaffold mutations, strategy parameters, failure avoidance, and CDR composition
  • Cross-Target Knowledge Transfer: Design knowledge from one antigen transfers to improve designs for new, unseen antigens
  • Verifiable Design Quality: Silver Standard retrospective validation against experimentally-validated SAbDab nanobodies

Results Summary

Silver Standard Retrospective Validation (30 SAbDab pairs)

Method Recovery Score CDR3 Comp Cosine CDR3 Prop Cosine Length Match
Random Baseline 42.2 0.347 0.794 53.3%
Template Baseline 40.1 0.498 0.860 3.3%
v4 No-KG 41.3 0.506 0.851 13.3%
v4 Full (KG+FB) 47.5 0.492 0.821 53.3%

v4_full vs random: +5.3 | v4_full vs no_kg: +6.2 (KG value)

Full Experiments (Nature Score, 0-100)

Experiment Mean Β±Std Grades
E1 Baseline (no KG, no FB) 62.6 8.1 B:5, C:5
E2 KG Only 57.7 2.3 C:7, B:3
E3 Full v4 (KG+FB) 68.4 1.0 B:10
E4 Scaling (30 targets) 68.5 0.8 B:30
E5 Cross-Transfer 68.7 0.6 B:10
E7 Feedback Only 70.2 2.6 B:10
E9 Large Scale (50) 68.5 0.8 B:25

Key Comparisons

  • E1β†’E3 (full system): +5.8
  • FB value (E2β†’E3): +10.7
  • Learning Curve: Batch 1β†’5 converges by batch 2 (+0.8)

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  NanoAgent-GGKE v4 Pipeline                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  1. Target Analysis (PDB fetch + ESM embedding) β”‚
β”‚  2. KG-Guided Scaffold Mutation                 β”‚
β”‚  3. CDR Adaptation (ProteinMPNN + ESM scoring)  β”‚
β”‚  4. Greedy Feedback Loop (4 channels)           β”‚
β”‚  5. Composite Scoring (Nature Score)            β”‚
β”‚  6. KG Update (accumulate knowledge)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

KG Active Guidance (4 channels):
  β”œβ”€β”€ guide_scaffold_mutations()    β†’ position + AA suggestions
  β”œβ”€β”€ guide_strategy_params()       β†’ CDR length, composition bias
  β”œβ”€β”€ guide_failure_avoidance()     β†’ avoid known bad patterns
  └── guide_cdr_composition()       β†’ charge/hydrophobicity targets

Scoring

Nature Score = Structure(40%) + Sequence(30%) + Developability(30%)

  • Structure: ESMFold pLDDT + pTM
  • Sequence: ESM-1b pseudo-perplexity + CDR diversity
  • Developability: CamSol solubility + charge balance + no aggregation motifs

Project Structure

β”œβ”€β”€ code/                     # Main experiment scripts
β”‚   β”œβ”€β”€ run_v4.py            # Core pipeline (VHH scaffold, CDR adapt, feedback)
β”‚   β”œβ”€β”€ run_v4_full.py       # Full experiment suite (E1-E10 + S1)
β”‚   └── run_v4_retrospective.py  # Silver Standard validation
β”œβ”€β”€ src/                      # Module source code
β”‚   └── virtual_lab/
β”‚       β”œβ”€β”€ harness/composite_v3d.py    # Nature Score computation
β”‚       β”œβ”€β”€ knowledge_graph/nanokg_v4.py # KG with active guidance
β”‚       └── skills/                      # ESM, ESMFold, ProteinMPNN, etc.
β”œβ”€β”€ data/                     # Datasets
β”‚   └── retrospective_test_set.json  # 50 SAbDab nanobody-antigen pairs
β”œβ”€β”€ results/                  # Experiment results
β”‚   β”œβ”€β”€ retrospective_summary.json
β”‚   └── master_summary_condensed.json
└── deploy/                   # Deployment tools
    β”œβ”€β”€ deploy.sh            # One-click GPU setup script
    β”œβ”€β”€ pip_requirements.txt
    └── nanoagent_v4_complete_deploy.tar.gz

Quick Start (New GPU)

# 1. Upload and extract
tar xzf deploy/nanoagent_v4_complete_deploy.tar.gz
cd nanoagent && bash deploy.sh

# 2. Run experiments
python3 run_v4.py                    # Fast validation (3 targets, ~5min)
python3 run_v4_full.py               # Full suite (E1-E10, ~35min)
python3 run_v4_retrospective.py      # Silver Standard (~30min)

Requirements

  • GPU: NVIDIA with β‰₯16GB VRAM (tested on RTX 4090)
  • CUDA 12.x
  • Python 3.10+
  • PyTorch 2.6+
  • ESM, ESMFold, ProteinMPNN (auto-installed by deploy.sh)

Date

  • Experiments run: 2026-05-01
  • Total compute: ~70 min on single GPU
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support