PepDoRA / README.md

Update README.md

3a7e688 verified over 1 year ago

1.39 kB

license: cc-by-nc-nd-4.0

PepDoRA: A Modified Peptide-Specific Language Model via Weight-Decomposed Low-Rank Adaptation

In this work, we introduce PepDoRA, a novel pLM that fine-tunes the state-of-the-art ChemBERTa-77M-MLM SMILES transformer on modified peptide SMILES for downstream membrane permeability prediction and representation learning.

Here's how to extract PepDoRA embeddings for your input peptide:

import torch
from transformers import AutoModel, AutoTokenizer

# Load the model and tokenizer
model_name = "ChatterjeeLab/PepDoRA"
model = AutoModel.from_pretrained(model_name, output_hidden_states=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Input peptide sequence
peptide = "CC(C)C[C@H]1NC(=O)[C@@H](C)NCCCCCCNC(=O)[C@H](CO)NC1=O"

# Tokenize the peptide
inputs = tokenizer(peptide, return_tensors="pt")

# Get the hidden states (embeddings) from the model
with torch.no_grad():
    outputs = model(**inputs)

# Extract the embeddings from the last hidden layer
last_hidden_state = outputs.hidden_states[-1]

# Print the embedding shape (or the embedding itself)
print(embedding.shape)


## Repository Authors

[Leyao Wang](mailto:leyao.wang@vanderbilt.edu), Undergraduate Intern in the Chatterjee Lab <br>
[Pranam Chatterjee](mailto:pranam.chatterjee@duke.edu), Assistant Professor at Duke University 

Reach out to us with any questions!