EvoLlama / README.md
nwliu's picture
update README
59f5ea3 verified
metadata
license: mit

EvoLlama

EvoLlama is a multimodal framework that connects a structure-based protein encoder, a sequence-based protein encoder, and an LLM for protein understanding through a two-stage training process. For more details, please refer to our paper: EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations.

Quickstart

For more details, please refer to our GitHub repository.

Model Family

Models Stages Datasets PDB Paths
EvoLlama (ProteinMPNN + ESM-2) Projection Tuning SwissProt AlphaFold-2 projection_tuning/protein_mpnn_esm2_650m
EvoLlama (ProteinMPNN + ESM-2) Supervised Fine-tuning PMol + PEER ESMFold supervised_fine_tuning/protein_mpnn_esm2_650m
EvoLlama (GearNet + ESM-2) Projection Tuning SwissProt AlphaFold-2 Coming soon ...
EvoLlama (GearNet + ESM-2) Supervised Fine-tuning PMol + PEER ESMFold Coming soon ...

Model Architecture

EvoLlama is initialized with the weights of the following models:

Models Links
ProteinMPNN Link
GearNet Link
ESM-2 650M (facebook/esm2_t33_650M_UR50D) Link
Llama-3 (meta-llama/Meta-Llama-3-8B-Instruct) Link

Citation

@misc{liu2024evollama,
    title={EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations}, 
    author={Nuowei Liu and Changzhi Sun and Tao Ji and Junfeng Tian and Jianxin Tang and Yuanbin Wu and Man Lan},
    year={2024},
    eprint={2412.11618},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2412.11618}, 
}