Struct2Seq-GNN

Model Description

Struct2Seq-GNN is a lightweight, 6-layer Graph Neural Network designed for inverse protein folding (structure-to-sequence prediction). By mapping the 3D spatial coordinates of protein backbones to their corresponding amino acid sequences, this model serves as a foundational tool for computational protein engineering and structural bioinformatics workflows.

Intended Uses & Limitations

  • Primary Use: Computational protein design, generating plausible sequences for novel or heavily modified protein backbones.
  • Limitations: This is a lightweight architecture built as an independent research project. While it achieves high native sequence recovery, it is not intended for out-of-the-box production of clinical therapeutics without further validation and optimization.

Training Data & Procedure

  • Dataset: Trained on biological protein assemblies from the PDB, clustered at a 30% sequence identity cutoff to prevent data leakage.
  • Data Augmentation: During training, 0.1 ร… standard deviation Gaussian noise was applied to all input atomic coordinates. This critical augmentation prevents the model from "reading out" the native sequence from over-refined crystal artifacts, forcing it to learn the underlying biophysics of the fold.
  • Hardware: Trained efficiently over ~65 epochs on a 4-GPU HPC cluster.

Evaluation Metrics

The model demonstrates strong generalization and robust learning of physical constraints:

  • Global Sequence Recovery: ~33% validation accuracy across all residues. (Achieving >30% sequence identity strongly suggests the generated sequences will reliably adopt the target 3D fold).
  • Convergence: Validation loss plateaued smoothly at ~2.236.
  • (Optional: Add your 5.0 ร… binding pocket recovery metric here once you calculate it!)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using WSobo/Struct2Seq-GNN 1