InstaDeepAI
/

IDP-ESM2-8M

Feature Extraction

Model card Files Files and versions

IDP-ESM2-8M / README.md

jeanq1's picture

Update README.md

caf369c verified about 1 month ago

|

history blame contribute delete

1.51 kB

	---
	library_name: transformers
	pipeline_tag: feature-extraction
	model_name: InstaDeepAI/IDP-ESM2-8M
	---

	# IDP-ESM2-8M

	IDP-ESM2-8M is an ESM2-style encoder for intrinsically disorded protein sequence representation learning, trained on [IDP-Euka-90](https://huggingface.co/datasets/InstaDeepAI/IDP-Euka-90).
	This repository provides a Transformer encoder suitable for extracting sequence embeddings.

	---

	## Quick start: generate embeddings

	The snippet below loads the tokenizer and model, runs a forward pass on a couple of sequences and extracts embeddings for each sequence.

	```python
	from transformers import AutoTokenizer, AutoModel
	import torch

	# --- Config ---
	model_name = "InstaDeepAI/IDP-ESM2-8M"

	# --- Load model and tokenizer ---
	tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")
	model = AutoModel.from_pretrained(model_name)
	model.eval()

	# (optional) use GPU if available
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)

	# --- Input sequences ---
	sequences = [
	"MDDNHYPHHHHNHHNHHSTSGGCGESQFTTKLSVNTFARTHPMIQNDLIDLDLISGSAFTMKSKSQQ",
	"PADRDLSSPFGSTVPGVGPNAAAASNAAAAAAAAATAGSNKHQTPPTTFR",
	]

	# --- Tokenize ---
	inputs = tokenizer(
	sequences,
	return_tensors="pt",
	padding=True,
	truncation=True,
	)
	inputs = {k: v.to(device) for k, v in inputs.items()}

	# --- Forward pass ---
	with torch.no_grad():
	outputs = model(**inputs)
	embeddings = outputs.last_hidden_state # shape: (batch, seq_len, hidden_dim)