metadata
license: cc-by-nc-nd-4.0
PepDoRA: A Modified Peptide-Specific Language Model via Weight-Decomposed Low-Rank Adaptation
In this work, we introduce PepDoRA, a novel pLM that fine-tunes the state-of-the-art ChemBERTa-77M-MLM SMILES transformer on modified peptide SMILES for downstream membrane permeability prediction and representation learning.
Here's how to extract PepDoRA embeddings for your input peptide:
import torch
from transformers import AutoModel, AutoTokenizer
# Load the model and tokenizer
model_name = "ChatterjeeLab/PepDoRA"
model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Input peptide sequence
peptide = "CC(C)C[C@H]1NC(=O)[C@@H](C)NCCCCCCNC(=O)[C@H](CO)NC1=O"
# Tokenize the peptide
inputs = tokenizer(peptide, return_tensors="pt")
# Get the hidden states (embeddings) from the model
with torch.no_grad():
outputs = model(**inputs)
# Extract the embeddings from the last hidden layer
last_hidden_state = outputs.hidden_states[-1]
# Print the embedding shape (or the embedding itself)
print(embedding.shape)
## Repository Authors
[Leyao Wang](mailto:leyao.wang@vanderbilt.edu), Undergraduate Intern in the Chatterjee Lab <br>
[Pranam Chatterjee](mailto:pranam.chatterjee@duke.edu), Assistant Professor at Duke University
Reach out to us with any questions!