| | --- |
| | license: mit |
| | language: |
| | - en |
| | pipeline_tag: feature-extraction |
| | library_name: transformers |
| | tags: |
| | - protein-language-model |
| | - plm |
| | --- |
| | |
| | # ProtXLNet |
| |
|
| | The model was developed by Ahmed Elnaggar et al. and more information can be found on the [GitHub repository](https://github.com/agemagician/ProtTrans) and in the [accompanying paper](https://ieeexplore.ieee.org/document/9477085). This repository is a fork of their [HuggingFace repository](https://huggingface.co/Rostlab/prot_xlnet/tree/main). |
| |
|
| | # Inference example |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModel |
| | import torch |
| | import re |
| | |
| | # Load tokenizer and model |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("virtual-human-chc/prot_xlnet", use_fast=False) |
| | model = AutoModel.from_pretrained("virtual-human-chc/prot_xlnet").eval() |
| | |
| | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| | model = model.to(device) |
| | |
| | # Example protein sequences |
| | sequences = ["A E T C Z A O", "S K T Z P"] |
| | sequences = [re.sub(r"[UZOB]", "X", sequence) for sequence in sequences] |
| | |
| | # Tokenize and extract embeddings |
| | inputs = tokenizer(sequences, padding=True, return_tensors="pt") |
| | # In case of GPU |
| | inputs = {k: v.to(device) for k, v in inputs.items()} |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | |
| | print(outputs.last_hidden_state) |
| | ``` |
| |
|
| | # Copyright |
| |
|
| | Code derived from https://github.com/agemagician/ProtTrans is licensed under the MIT License, Copyright (c) 2025 Ahmed Elnaggar. The ProtTrans pretrained models are released under the under terms of the [Academic Free License v3.0 License](https://choosealicense.com/licenses/afl-3.0/), Copyright (c) 2025 Ahmed Elnaggar. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov. |