jeanq1 commited on
Commit
85ad5fb
·
verified ·
1 Parent(s): ac92f27

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +53 -3
  2. gitattributes +35 -0
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: feature-extraction
4
+ model_name: InstaDeepAI/IDP-ESM2-8M
5
+ ---
6
+
7
+ # IDP-ESM2-8M
8
+
9
+ **IDP-ESM2-150M** is an ESM2-style encoder for intrinsically disorded protein sequence representation learning, trained on [IDP-Euka-90](https://huggingface.co/datasets/jeanq1/IDP-Euka-90).
10
+ This repository provides a Transformer encoder suitable for extracting **per-sequence embeddings** (mean-pooled over residues with padding masked out).
11
+
12
+ ---
13
+
14
+ ## Quick start: generate embeddings
15
+
16
+ The snippet below loads the tokenizer and model, runs a forward pass on a couple of sequences and extracts embeddings for each sequence.
17
+
18
+ ```python
19
+ from transformers import AutoTokenizer, AutoModel
20
+ import torch
21
+
22
+ # --- Config ---
23
+ model_name = "InstaDeepAI/IDP-ESM2-150M"
24
+
25
+ # --- Load model and tokenizer ---
26
+ tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")
27
+ model = AutoModel.from_pretrained(model_name)
28
+ model.eval()
29
+
30
+ # (optional) use GPU if available
31
+ device = "cuda" if torch.cuda.is_available() else "cpu"
32
+ model.to(device)
33
+
34
+ # --- Input sequences ---
35
+ sequences = [
36
+ "MDDNHYPHHHHNHHNHHSTSGGCGESQFTTKLSVNTFARTHPMIQNDLIDLDLISGSAFTMKSKSQQ",
37
+ "PADRDLSSPFGSTVPGVGPNAAAASNAAAAAAAAATAGSNKHQTPPTTFR",
38
+ ]
39
+
40
+ # --- Tokenize ---
41
+ inputs = tokenizer(
42
+ sequences,
43
+ return_tensors="pt",
44
+ padding=True,
45
+ truncation=True,
46
+ )
47
+ inputs = {k: v.to(device) for k, v in inputs.items()}
48
+
49
+ # --- Forward pass ---
50
+ with torch.no_grad():
51
+ outputs = model(**inputs)
52
+ embeddings = outputs.last_hidden_state # shape: (batch, seq_len, hidden_dim)
53
+
gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text