File size: 1,711 Bytes
380d703
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60651a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
380d703
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
license: mit
language:
- en
pipeline_tag: feature-extraction
library_name: transformers
tags:
- protein-language-model
- plm
---

# ProtXLNet

The model was developed by Ahmed Elnaggar et al. and more information can be found on the [GitHub repository](https://github.com/agemagician/ProtTrans) and in the [accompanying paper](https://ieeexplore.ieee.org/document/9477085). This repository is a fork of their [HuggingFace repository](https://huggingface.co/Rostlab/prot_xlnet/tree/main).

# Inference example

```python
from transformers import AutoTokenizer, AutoModel
import torch
import re

# Load tokenizer and model

tokenizer = AutoTokenizer.from_pretrained("virtual-human-chc/prot_xlnet", use_fast=False)
model = AutoModel.from_pretrained("virtual-human-chc/prot_xlnet").eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Example protein sequences
sequences = ["A E T C Z A O", "S K T Z P"]
sequences = [re.sub(r"[UZOB]", "X", sequence) for sequence in sequences]

# Tokenize and extract embeddings
inputs = tokenizer(sequences, padding=True, return_tensors="pt")
# In case of GPU
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

print(outputs.last_hidden_state)
```

# Copyright

Code derived from https://github.com/agemagician/ProtTrans is licensed under the MIT License, Copyright (c) 2025 Ahmed Elnaggar. The ProtTrans pretrained models are released under the under terms of the [Academic Free License v3.0 License](https://choosealicense.com/licenses/afl-3.0/), Copyright (c) 2025 Ahmed Elnaggar. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov.