virtual-human-chc
/

prot_xlnet

Feature Extraction

protein-language-model

Model card Files Files and versions

prot_xlnet / README.md

pavm595's picture

Update README.md

60651a3 verified 5 months ago

|

1.71 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: feature-extraction
	library_name: transformers
	tags:
	- protein-language-model
	- plm
	---

	# ProtXLNet

	The model was developed by Ahmed Elnaggar et al. and more information can be found on the [GitHub repository](https://github.com/agemagician/ProtTrans) and in the [accompanying paper](https://ieeexplore.ieee.org/document/9477085). This repository is a fork of their [HuggingFace repository](https://huggingface.co/Rostlab/prot_xlnet/tree/main).

	# Inference example

	```python
	from transformers import AutoTokenizer, AutoModel
	import torch
	import re

	# Load tokenizer and model

	tokenizer = AutoTokenizer.from_pretrained("virtual-human-chc/prot_xlnet", use_fast=False)
	model = AutoModel.from_pretrained("virtual-human-chc/prot_xlnet").eval()

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = model.to(device)

	# Example protein sequences
	sequences = ["A E T C Z A O", "S K T Z P"]
	sequences = [re.sub(r"[UZOB]", "X", sequence) for sequence in sequences]

	# Tokenize and extract embeddings
	inputs = tokenizer(sequences, padding=True, return_tensors="pt")
	# In case of GPU
	inputs = {k: v.to(device) for k, v in inputs.items()}

	with torch.no_grad():
	outputs = model(**inputs)

	print(outputs.last_hidden_state)
	```

	# Copyright

	Code derived from https://github.com/agemagician/ProtTrans is licensed under the MIT License, Copyright (c) 2025 Ahmed Elnaggar. The ProtTrans pretrained models are released under the under terms of the [Academic Free License v3.0 License](https://choosealicense.com/licenses/afl-3.0/), Copyright (c) 2025 Ahmed Elnaggar. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov.