--- library_name: transformers tags: - lamp - hydramp - tokenizer - amino-acids --- # LAMP HydrAMP AA tokenizer Peptide tokenizer used by **LAMP HydrAMP** Hub models: maps amino-acid strings to fixed-length token IDs (padding/truncation to the HydrAMP sequence length). Load with ``trust_remote_code=True`` because the class ships in this repo. When you publish results or reuse HydrAMP tokenization, cite the original *Nature Communications* paper (Szymczak *et al.*, 2023); **Citation** at the bottom of this README has BibTeX and links. **Tokenizer repo:** `pszmk/hydramp-aa-tokenizer` ## Load ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( "pszmk/hydramp-aa-tokenizer", revision="main", trust_remote_code=True, ) ``` ## Use with a HydrAMP model Point ``AutoModel.from_pretrained`` at your HydrAMP **model** repo (same ``revision`` if you version them together), then tokenize before ``encode`` / ``forward``: ```python import torch from transformers import AutoModel, AutoTokenizer model_id = "pszmk/hydramp" tokenizer_id = "pszmk/hydramp-aa-tokenizer" tokenizer = AutoTokenizer.from_pretrained( tokenizer_id, revision="main", trust_remote_code=True, ) model = AutoModel.from_pretrained( model_id, revision="main", trust_remote_code=True, ) model.eval() batch = tokenizer( ["ACDEFGHIKLMNPQRSTVWY"], padding="max_length", truncation=True, max_length=model.config.sequence_length, return_tensors="pt", ) with torch.no_grad(): mean, log_std = model.encoder.encode(batch["input_ids"]) ``` ## Citation The **HydrAMP** architecture and original model were introduced by Szymczak *et al.* in *Nature Communications* (2023). When you refer to HydrAMP or build on this work, please cite: ```bibtex @article{szymczak_discovering_2023, title = {Discovering highly potent antimicrobial peptides with deep generative model {HydrAMP}}, volume = {14}, issn = {2041-1723}, url = {https://www.nature.com/articles/s41467-023-36994-z}, doi = {10.1038/s41467-023-36994-z}, abstract = {Antimicrobial peptides emerge as compounds that can alleviate the global health hazard of antimicrobial resistance, prompting a need for novel computational approaches to peptide generation. Here, we propose HydrAMP, a conditional variational autoencoder that learns lower-dimensional, continuous representation of peptides and captures their antimicrobial properties. The model disentangles the learnt representation of a peptide from its antimicrobial conditions and leverages parameter-controlled creativity. HydrAMP is the first model that is directly optimized for diverse tasks, including unconstrained and analogue generation and outperforms other approaches in these tasks. An additional preselection procedure based on ranking of generated peptides and molecular dynamics simulations increases experimental validation rate. Wet-lab experiments on five bacterial strains confirm high activity of nine peptides generated as analogues of clinically relevant prototypes, as well as six analogues of an inactive peptide. HydrAMP enables generation of diverse and potent peptides, making a step towards resolving the antimicrobial resistance crisis.}, language = {en}, number = {1}, journal = {Nature Communications}, author = {Szymczak, Paulina and Możejko, Marcin and Grzegorzek, Tomasz and Jurczak, Radosław and Bauer, Marta and Neubauer, Damian and Sikora, Karol and Michalski, Michał and Sroka, Jacek and Setny, Piotr and Kamysz, Wojciech and Szczurek, Ewa}, month = mar, year = {2023}, keywords = {Computational models, Machine learning, Protein design}, pages = {1453}, } ``` - **DOI:** [10.1038/s41467-023-36994-z](https://doi.org/10.1038/s41467-023-36994-z) - **Article:** [nature.com](https://www.nature.com/articles/s41467-023-36994-z)