# Pretrained model This model is a pretrained autoregressive transformer model in GPT-style, trained on a large number of protein sequences. The training task used is "Sequence<...>". Dataset: https://huggingface.co/datasets/lamm-mit/GPTProteinPretrained Load pretrained model: ```python from transformers import AutoModelForCausalLM, AutoTokenizer pretrained_model_name='lamm-mit/GPTProteinPretrained' tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token model_name = pretrained_model_name model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True ).to(device) model.config.use_cache = False ``` Sample inference using the "Sequence<...>" task, where here, the model will simply autocomplete the sequence starting with "AIIAA": ```python import torch device='cuda' prompt = "Sequence ``` ## Citation To cite this work: ``` @article{WeiKaplanBuehler_2023, title = {Generative pretrained autoregressive transformer graph neural network applied to the analysis and discovery of novel proteins}, author = {M.J. Buehler}, journal = {J. Appl. Phys.}, year = {2023}, volume = {}, pages = {}, url = {https://doi.org/10.1063/5.0157367} } ```