Configuration Parsing Warning:In config.json: "architectures" must be an array
LAMP HydrAMP AA tokenizer
Peptide tokenizer used by LAMP HydrAMP Hub models: maps amino-acid strings to
fixed-length token IDs (padding/truncation to the HydrAMP sequence length). Load with
trust_remote_code=True because the class ships in this repo.
When you publish results or reuse HydrAMP tokenization, cite the original Nature Communications paper (Szymczak et al., 2023); Citation at the bottom of this README has BibTeX and links.
Tokenizer repo: pszmk/hydramp-aa-tokenizer
Load
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"pszmk/hydramp-aa-tokenizer",
revision="main",
trust_remote_code=True,
)
Use with a HydrAMP model
Point AutoModel.from_pretrained at your HydrAMP model repo (same revision if
you version them together), then tokenize before encode / forward:
import torch
from transformers import AutoModel, AutoTokenizer
model_id = "pszmk/hydramp"
tokenizer_id = "pszmk/hydramp-aa-tokenizer"
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_id,
revision="main",
trust_remote_code=True,
)
model = AutoModel.from_pretrained(
model_id,
revision="main",
trust_remote_code=True,
)
model.eval()
batch = tokenizer(
["ACDEFGHIKLMNPQRSTVWY"],
padding="max_length",
truncation=True,
max_length=model.config.sequence_length,
return_tensors="pt",
)
with torch.no_grad():
mean, log_std = model.encoder.encode(batch["input_ids"])
Citation
The HydrAMP architecture and original model were introduced by Szymczak et al. in Nature Communications (2023). When you refer to HydrAMP or build on this work, please cite:
@article{szymczak_discovering_2023,
title = {Discovering highly potent antimicrobial peptides with deep generative model {HydrAMP}},
volume = {14},
issn = {2041-1723},
url = {https://www.nature.com/articles/s41467-023-36994-z},
doi = {10.1038/s41467-023-36994-z},
abstract = {Antimicrobial peptides emerge as compounds that can alleviate the global health hazard of antimicrobial resistance, prompting a need for novel computational approaches to peptide generation. Here, we propose HydrAMP, a conditional variational autoencoder that learns lower-dimensional, continuous representation of peptides and captures their antimicrobial properties. The model disentangles the learnt representation of a peptide from its antimicrobial conditions and leverages parameter-controlled creativity. HydrAMP is the first model that is directly optimized for diverse tasks, including unconstrained and analogue generation and outperforms other approaches in these tasks. An additional preselection procedure based on ranking of generated peptides and molecular dynamics simulations increases experimental validation rate. Wet-lab experiments on five bacterial strains confirm high activity of nine peptides generated as analogues of clinically relevant prototypes, as well as six analogues of an inactive peptide. HydrAMP enables generation of diverse and potent peptides, making a step towards resolving the antimicrobial resistance crisis.},
language = {en},
number = {1},
journal = {Nature Communications},
author = {Szymczak, Paulina and Możejko, Marcin and Grzegorzek, Tomasz and Jurczak, Radosław and Bauer, Marta and Neubauer, Damian and Sikora, Karol and Michalski, Michał and Sroka, Jacek and Setny, Piotr and Kamysz, Wojciech and Szczurek, Ewa},
month = mar,
year = {2023},
keywords = {Computational models, Machine learning, Protein design},
pages = {1453},
}
- DOI: 10.1038/s41467-023-36994-z
- Article: nature.com
- Downloads last month
- 45