EvoLlama / README.md

update README

59f5ea3 verified about 1 year ago

2.74 kB

license: mit

EvoLlama

EvoLlama is a multimodal framework that connects a structure-based protein encoder, a sequence-based protein encoder, and an LLM for protein understanding through a two-stage training process. For more details, please refer to our paper: EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations.

Quickstart

For more details, please refer to our GitHub repository.

Model Family

Models	Stages	Datasets	PDB	Paths
EvoLlama (ProteinMPNN + ESM-2)	Projection Tuning	SwissProt	AlphaFold-2	projection_tuning/protein_mpnn_esm2_650m
EvoLlama (ProteinMPNN + ESM-2)	Supervised Fine-tuning	PMol + PEER	ESMFold	supervised_fine_tuning/protein_mpnn_esm2_650m
EvoLlama (GearNet + ESM-2)	Projection Tuning	SwissProt	AlphaFold-2	Coming soon ...
EvoLlama (GearNet + ESM-2)	Supervised Fine-tuning	PMol + PEER	ESMFold	Coming soon ...

Model Architecture

EvoLlama is initialized with the weights of the following models:

Models	Links
ProteinMPNN	Link
GearNet	Link
ESM-2 650M (facebook/esm2_t33_650M_UR50D)	Link
Llama-3 (meta-llama/Meta-Llama-3-8B-Instruct)	Link

Citation

@misc{liu2024evollama,
    title={EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations}, 
    author={Nuowei Liu and Changzhi Sun and Tao Ji and Junfeng Tian and Jianxin Tang and Yuanbin Wu and Man Lan},
    year={2024},
    eprint={2412.11618},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2412.11618}, 
}