adibvafa commited on
Commit
1bf20bd
·
verified ·
1 Parent(s): 6ddb141

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -30,7 +30,7 @@ datasets:
30
 
31
  GO-GPT is a decoder-only transformer model for predicting Gene Ontology (GO) terms from protein sequences. It combines ESM2 protein language model embeddings with an autoregressive decoder to generate GO term annotations across all three ontology aspects: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC).
32
 
33
- Unlike discriminative methods, GO-GPT treats GO prediction as a sequence generation task, capturing hierarchical and cross-aspect dependencies to achieve state-of-the-art weighted F_max of 0.650.70.
34
 
35
  | Component | Description |
36
  |-----------|-------------|
@@ -38,7 +38,7 @@ Unlike discriminative methods, GO-GPT treats GO prediction as a sequence generat
38
  | Decoder | 12-layer GPT with prefix causal attention |
39
  | Total Parameters | ~3.2B (3B ESM2 + 200M decoder) |
40
 
41
- **Training data:** 120,143 proteins, available at [wanglab/gogpt-training-data](https://huggingface.co/datasets/wanglab/gogpt-training-data).
42
 
43
  **Code:** [github.com/bowang-lab/BioReason-Pro/gogpt](https://github.com/bowang-lab/BioReason-Pro/tree/main/gogpt)
44
 
 
30
 
31
  GO-GPT is a decoder-only transformer model for predicting Gene Ontology (GO) terms from protein sequences. It combines ESM2 protein language model embeddings with an autoregressive decoder to generate GO term annotations across all three ontology aspects: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC).
32
 
33
+ Unlike discriminative methods, GO-GPT treats GO prediction as a sequence generation task, capturing hierarchical and cross-aspect dependencies to achieve state-of-the-art weighted F_max of 0.65-0.70.
34
 
35
  | Component | Description |
36
  |-----------|-------------|
 
38
  | Decoder | 12-layer GPT with prefix causal attention |
39
  | Total Parameters | ~3.2B (3B ESM2 + 200M decoder) |
40
 
41
+ **Training data:** [wanglab/gogpt-training-data](https://huggingface.co/datasets/wanglab/gogpt-training-data)
42
 
43
  **Code:** [github.com/bowang-lab/BioReason-Pro/gogpt](https://github.com/bowang-lab/BioReason-Pro/tree/main/gogpt)
44