Instructions to use InstaDeepAI/IDP-ESM2-150M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use InstaDeepAI/IDP-ESM2-150M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="InstaDeepAI/IDP-ESM2-150M")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("InstaDeepAI/IDP-ESM2-150M") model = AutoModelForMaskedLM.from_pretrained("InstaDeepAI/IDP-ESM2-150M") - Notebooks
- Google Colab
- Kaggle
File size: 1,568 Bytes
85ad5fb 1dbe094 85ad5fb e4af527 85ad5fb 1dbe094 85ad5fb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ---
library_name: transformers
pipeline_tag: feature-extraction
model_name: InstaDeepAI/IDP-ESM2-150M
---
# IDP-ESM2-8M
**IDP-ESM2-150M** is an ESM2-style encoder for intrinsically disorded protein sequence representation learning, trained on [IDP-Euka-90](https://huggingface.co/datasets/InstaDeepAI/IDP-Euka-90).
This repository provides a Transformer encoder suitable for extracting **per-sequence embeddings** (mean-pooled over residues with padding masked out).
---
## Quick start: generate embeddings
The snippet below loads the tokenizer and model, runs a forward pass on a couple of sequences and extracts embeddings for each sequence.
```python
from transformers import AutoTokenizer, AutoModel
import torch
# --- Config ---
model_name = "InstaDeepAI/IDP-ESM2-150M"
# --- Load model and tokenizer ---
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")
model = AutoModel.from_pretrained(model_name)
model.eval()
# (optional) use GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# --- Input sequences ---
sequences = [
"MDDNHYPHHHHNHHNHHSTSGGCGESQFTTKLSVNTFARTHPMIQNDLIDLDLISGSAFTMKSKSQQ",
"PADRDLSSPFGSTVPGVGPNAAAASNAAAAAAAAATAGSNKHQTPPTTFR",
]
# --- Tokenize ---
inputs = tokenizer(
sequences,
return_tensors="pt",
padding=True,
truncation=True,
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# --- Forward pass ---
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state # shape: (batch, seq_len, hidden_dim)
|