Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,36 @@ tags:
|
|
| 13 |
|
| 14 |
The model was developed by Ahmed Elnaggar et al. and more information can be found on the [GitHub repository](https://github.com/agemagician/ProtTrans) and in the [accompanying paper](https://ieeexplore.ieee.org/document/9477085). This repository is a fork of their [HuggingFace repository](https://huggingface.co/Rostlab/prot_xlnet/tree/main).
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
# Copyright
|
| 17 |
|
| 18 |
Code derived from https://github.com/agemagician/ProtTrans is licensed under the MIT License, Copyright (c) 2025 Ahmed Elnaggar. The ProtTrans pretrained models are released under the under terms of the [Academic Free License v3.0 License](https://choosealicense.com/licenses/afl-3.0/), Copyright (c) 2025 Ahmed Elnaggar. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov.
|
|
|
|
| 13 |
|
| 14 |
The model was developed by Ahmed Elnaggar et al. and more information can be found on the [GitHub repository](https://github.com/agemagician/ProtTrans) and in the [accompanying paper](https://ieeexplore.ieee.org/document/9477085). This repository is a fork of their [HuggingFace repository](https://huggingface.co/Rostlab/prot_xlnet/tree/main).
|
| 15 |
|
| 16 |
+
# Inference example
|
| 17 |
+
|
| 18 |
+
```python
|
| 19 |
+
from transformers import AutoTokenizer, AutoModel
|
| 20 |
+
import torch
|
| 21 |
+
import re
|
| 22 |
+
|
| 23 |
+
# Load tokenizer and model
|
| 24 |
+
|
| 25 |
+
tokenizer = AutoTokenizer.from_pretrained("virtual-human-chc/prot_xlnet", use_fast=False)
|
| 26 |
+
model = AutoModel.from_pretrained("virtual-human-chc/prot_xlnet").eval()
|
| 27 |
+
|
| 28 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 29 |
+
model = model.to(device)
|
| 30 |
+
|
| 31 |
+
# Example protein sequences
|
| 32 |
+
sequences = ["A E T C Z A O", "S K T Z P"]
|
| 33 |
+
sequences = [re.sub(r"[UZOB]", "X", sequence) for sequence in sequences]
|
| 34 |
+
|
| 35 |
+
# Tokenize and extract embeddings
|
| 36 |
+
inputs = tokenizer(sequences, padding=True, return_tensors="pt")
|
| 37 |
+
# In case of GPU
|
| 38 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 39 |
+
|
| 40 |
+
with torch.no_grad():
|
| 41 |
+
outputs = model(**inputs)
|
| 42 |
+
|
| 43 |
+
print(outputs.last_hidden_state)
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
# Copyright
|
| 47 |
|
| 48 |
Code derived from https://github.com/agemagician/ProtTrans is licensed under the MIT License, Copyright (c) 2025 Ahmed Elnaggar. The ProtTrans pretrained models are released under the under terms of the [Academic Free License v3.0 License](https://choosealicense.com/licenses/afl-3.0/), Copyright (c) 2025 Ahmed Elnaggar. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov.
|