gabrielbianchin commited on
Commit
93197c5
·
1 Parent(s): 0fef163

update readme

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -39,9 +39,52 @@ SeqScreen computes cosine similarities between protein and molecule embeddings,
39
  ### Similarity
40
 
41
  ```python
42
- # code here
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ```
44
 
 
 
 
 
45
 
46
  ## Citation
47
 
 
39
  ### Similarity
40
 
41
  ```python
42
+ from transformers import AutoTokenizer, AutoModel
43
+ import torch
44
+
45
+ # proteins
46
+ tokenizer_prot = AutoTokenizer.from_pretrained('facebook/esm2_t36_3B_UR50D')
47
+ encoder_prot = AutoModel.from_pretrained('facebook/esm2_t36_3B_UR50D').eval()
48
+
49
+ proteins = ["MKTFFVLLL", "ABCDE"]
50
+ proteins = [" ".join(i) for i in proteins]
51
+ inputs_prot = tokenizer_prot(proteins, return_tensors="pt", padding=True)
52
+
53
+ with torch.no_grad():
54
+ outputs = encoder_prot(**inputs_prot)
55
+ hidden = outputs.last_hidden_state[:, :]
56
+ mask = inputs_prot['attention_mask'].unsqueeze(-1).float()
57
+ prot_rep = (hidden * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1e-8)
58
+
59
+ # molecules
60
+ tokenizer_mol = AutoTokenizer.from_pretrained('SaeedLab/MolDeBERTa-base-123M-mlc')
61
+ encoder_mol = AutoModel.from_pretrained('SaeedLab/MolDeBERTa-base-123M-mlc').eval()
62
+
63
+ molecules = ["NCCc1nc(-c2ccccc2)cs1", "CC(=O)OCC(C)C"]
64
+ inputs_mol = tokenizer_mol(molecules, return_tensors="pt", padding=True)
65
+
66
+ with torch.no_grad():
67
+ outputs = encoder_mol(**inputs_mol)
68
+ hidden = outputs.last_hidden_state[:, :]
69
+ mask = inputs_mol['attention_mask'].unsqueeze(-1).float()
70
+ mol_rep = (hidden * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1e-8)
71
+
72
+ # seqscreen
73
+ seqscreen = AutoModel.from_pretrained('SaeedLab/SeqScreen-Frozen', trust_remote_code=True).eval()
74
+
75
+ with torch.no_grad():
76
+ outputs = seqscreen(prot=prot_rep, mol=mol_rep)
77
+
78
+ print('Protein embeddings projected:', outputs.prot_rep)
79
+ print('Molecule embeddings projected:', outputs.mol_rep)
80
+ print('Cossine similarity:', outputs.similarity)
81
+
82
  ```
83
 
84
+ The returned outputs are:
85
+ - prot_rep: Projected embeddings for protein input in 512 dimension.
86
+ - mol_rep: Projected embeddings for molecule input in 512 dimension.
87
+ - similarity: Cossine similarity between proteins and molecules.
88
 
89
  ## Citation
90