wanglab
/

gogpt

@@ -30,7 +30,7 @@ datasets:
 GO-GPT is a decoder-only transformer model for predicting Gene Ontology (GO) terms from protein sequences. It combines ESM2 protein language model embeddings with an autoregressive decoder to generate GO term annotations across all three ontology aspects: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC).
-Unlike discriminative methods, GO-GPT treats GO prediction as a sequence generation task, capturing hierarchical and cross-aspect dependencies to achieve state-of-the-art weighted F_max of 0.65–0.70.
 | Component | Description |
 |-----------|-------------|
@@ -38,7 +38,7 @@ Unlike discriminative methods, GO-GPT treats GO prediction as a sequence generat
 | Decoder | 12-layer GPT with prefix causal attention |
 | Total Parameters | ~3.2B (3B ESM2 + 200M decoder) |
-**Training data:** 120,143 proteins, available at [wanglab/gogpt-training-data](https://huggingface.co/datasets/wanglab/gogpt-training-data).
 **Code:** [github.com/bowang-lab/BioReason-Pro/gogpt](https://github.com/bowang-lab/BioReason-Pro/tree/main/gogpt)

 GO-GPT is a decoder-only transformer model for predicting Gene Ontology (GO) terms from protein sequences. It combines ESM2 protein language model embeddings with an autoregressive decoder to generate GO term annotations across all three ontology aspects: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC).
+Unlike discriminative methods, GO-GPT treats GO prediction as a sequence generation task, capturing hierarchical and cross-aspect dependencies to achieve state-of-the-art weighted F_max of 0.65-0.70.
 | Component | Description |
 |-----------|-------------|
 | Decoder | 12-layer GPT with prefix causal attention |
 | Total Parameters | ~3.2B (3B ESM2 + 200M decoder) |
+**Training data:** [wanglab/gogpt-training-data](https://huggingface.co/datasets/wanglab/gogpt-training-data)
 **Code:** [github.com/bowang-lab/BioReason-Pro/gogpt](https://github.com/bowang-lab/BioReason-Pro/tree/main/gogpt)