Protein-Protein Interaction Site Prediction
This model is a finetuned version of ESM2-3B [1] for protein-protein interaction site prediction. It predicts whether a certain amino acid in a protein sequence is part of an interaction site (1) or not (0).
For more details on the training and testing on this model, refer to the article [...].
The github repository to use with this model is available here: https://github.com/RitAreaSciencePark/PPI-Reps
The data for the training and evaluation of this model is available in csv format in this zenodo repository: https://doi.org/10.5281/zenodo.18802482
How to Get Started with the Model
This code snippet shows how to load the model and use it to predict probabilities that each amino acid in a protein sequence is part of a protein-protein interaction site.
import torch
from transformers import AutoModel, AutoTokenizer, AutoConfig
model_name = "evillegasgarcia/esm2-ppi-pdbbind-1"
# Load config
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
# Load model using the custom remote code
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
#move model to device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# run over a sample sequence
sequence = "MKTVRQERLKSIVRILEAAKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
inputs = tokenizer.encode(sequence, return_tensors="pt").to(device)
logits = model(inputs)["logits"]
probabilities = torch.sigmoid(logits)
probabilities
Training Details
The model was trained on the pdbBind dataset described on the paper.
We used the Adam optimizer with default hyperparameters, and weight decay of 0.01.
The learning rate was 1e-5 and we had a gradient accumulation batch size of 8.
Evaluation
The performance of the model was tested on the ZK448 benchmark available from the Zenodo repository and originally curated by [3]. The model has an accuracy of 0.74 and a Matthews Correlation Coefficient (MCC) score of 0.35.
References
- Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123-1130.
- Stringer, B., de Ferrante, H., Abeln, S., Heringa, J., Feenstra, K. A., & Haydarlou, R. (2022). PIPENN: protein interface prediction from sequence with an ensemble of neural nets. Bioinformatics, 38(8), 2111-2118.
- Zhang, J., & Kurgan, L. (2018). Review and comparative assessment of sequence-based predictors of protein-binding residues. Briefings in bioinformatics, 19(5), 821-837.
- Downloads last month
- 19