PeptideCLM
Collection
An improved version of PeptideCLM is availabe at https://huggingface.co/collections/aaronfeller/peptideclm-2 โข 3 items โข Updated
How to use aaronfeller/PeptideCLM-23M-all with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="aaronfeller/PeptideCLM-23M-all") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("aaronfeller/PeptideCLM-23M-all")
model = AutoModelForMaskedLM.from_pretrained("aaronfeller/PeptideCLM-23M-all")Peptide-trained Chemical Language Model using 10.8M peptides and 12.6M small molecules for MLM pretraining.
Loading the tokenizer is not possible with transformers. A custom tokenizer must be loaded from the 'tokenizer' directory found at at https://github.com/AaronFeller/PeptideCLM
An example script for this can be found in the repository. A short example is below (note, the tokenizer directory must be downloaded):
from tokenizer.my_tokenizers import SMILES_SPE_Tokenizer
def get_tokenizer():
vocab_file = 'tokenizer/new_vocab.txt'
splits_file = 'tokenizer/new_splits.txt'
tokenizer = SMILES_SPE_Tokenizer(vocab_file, splits_file)
return tokenizer