PyTorch
fm4bio
probablybots commited on
Commit
c3c7a6b
·
verified ·
1 Parent(s): 9e3dad5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -4,6 +4,8 @@ license: other
4
 
5
  # AIDO.Protein-RAG-3B
6
 
 
 
7
  AIDO.Protein-RAG-3B (AIDO.RAGPLM) is a pretrained Retrieval-Augmented protein language model within an [AI-driven Digital Organism](https://arxiv.org/abs/2412.06993) framework. This model, along with [AIDO.RAGFold](https://www.biorxiv.org/content/10.1101/2024.12.02.626519v1), integrates pretrained protein language models with retrieved Multiple Sequence Alignments (MSA), enabling the incorporation of co-evolutionary information for structure prediction while compensating for limited MSA data through large-scale pretraining.
8
 
9
  AIDO.Protein-RAG-3B outperforms single-sequence protein language models in perplexity, contact prediction, and fitness prediction. When used as a feature extractor for structure prediction in [AIDO.RAGFold](https://www.biorxiv.org/content/10.1101/2024.12.02.626519v1), it achieves TM-scores comparable to AlphaFold2 with sufficient MSA data (8x faster runtime), and significantly surpasses AlphaFold2 in MSA-limited scenarios (∆TM-score=0.379, 0.116, and 0.059 for 0, 5, and 10 input sequences respectively).
 
4
 
5
  # AIDO.Protein-RAG-3B
6
 
7
+ [![License](https://img.shields.io/badge/license-GenBio_AI_Community_License-orange)](https://github.com/genbio-ai/ModelGenerator/blob/main/LICENSE)
8
+
9
  AIDO.Protein-RAG-3B (AIDO.RAGPLM) is a pretrained Retrieval-Augmented protein language model within an [AI-driven Digital Organism](https://arxiv.org/abs/2412.06993) framework. This model, along with [AIDO.RAGFold](https://www.biorxiv.org/content/10.1101/2024.12.02.626519v1), integrates pretrained protein language models with retrieved Multiple Sequence Alignments (MSA), enabling the incorporation of co-evolutionary information for structure prediction while compensating for limited MSA data through large-scale pretraining.
10
 
11
  AIDO.Protein-RAG-3B outperforms single-sequence protein language models in perplexity, contact prediction, and fitness prediction. When used as a feature extractor for structure prediction in [AIDO.RAGFold](https://www.biorxiv.org/content/10.1101/2024.12.02.626519v1), it achieves TM-scores comparable to AlphaFold2 with sufficient MSA data (8x faster runtime), and significantly surpasses AlphaFold2 in MSA-limited scenarios (∆TM-score=0.379, 0.116, and 0.059 for 0, 5, and 10 input sequences respectively).