WSobo commited on
Commit
7170613
·
verified ·
1 Parent(s): fbd3650

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -1,3 +1,22 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Struct2Seq-GNN
5
+
6
+ ## Model Description
7
+ Struct2Seq-GNN is a lightweight, 6-layer Graph Neural Network designed for inverse protein folding (structure-to-sequence prediction). By mapping the 3D spatial coordinates of protein backbones to their corresponding amino acid sequences, this model serves as a foundational tool for computational protein engineering and structural bioinformatics workflows.
8
+
9
+ ## Intended Uses & Limitations
10
+ * **Primary Use:** Computational protein design, generating plausible sequences for novel or heavily modified protein backbones.
11
+ * **Limitations:** This is a lightweight architecture built as an independent research project. While it achieves high native sequence recovery, it is not intended for out-of-the-box production of clinical therapeutics without further validation and optimization.
12
+
13
+ ## Training Data & Procedure
14
+ * **Dataset:** Trained on biological protein assemblies from the PDB, clustered at a 30% sequence identity cutoff to prevent data leakage.
15
+ * **Data Augmentation:** During training, 0.1 Å standard deviation Gaussian noise was applied to all input atomic coordinates. This critical augmentation prevents the model from "reading out" the native sequence from over-refined crystal artifacts, forcing it to learn the underlying biophysics of the fold.
16
+ * **Hardware:** Trained efficiently over ~65 epochs on a 4-GPU HPC cluster.
17
+
18
+ ## Evaluation Metrics
19
+ The model demonstrates strong generalization and robust learning of physical constraints:
20
+ * **Global Sequence Recovery:** ~33% validation accuracy across all residues. (Achieving >30% sequence identity strongly suggests the generated sequences will reliably adopt the target 3D fold).
21
+ * **Convergence:** Validation loss plateaued smoothly at ~2.236.
22
+ * *(Optional: Add your 5.0 Å binding pocket recovery metric here once you calculate it!)*