File size: 1,140 Bytes
1f12fa8
 
 
 
 
 
 
 
 
 
 
 
54c5dd2
 
 
 
 
 
1f12fa8
 
52c8622
 
 
 
 
 
 
 
54c5dd2
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
license: apache-2.0
datasets:
- Oxer11/Protein-Function-Annotation
language:
- en
tags:
- Protein Langauge Model
- AI for Drug Discovery
- AI for Science
---

# ESM-S

ESM-S (https://arxiv.org/abs/2402.05856) is a series of structure-informed protein language models, which are trained on remote homology detection tasks for distilling structural information.
The corresponding datasets can be downloaded at https://huggingface.co/datasets/Oxer11/Protein-Function-Annotation.
The codebase can be found at https://github.com/DeepGraphLearning/esm-s.

![Training](./asset/training.png)

# Evaluation Performance

Freezing model weights and train a 2-layer MLP on downstream function prediction tasks.
![Predictor](./asset/predictor.png)

Using ESM-S representations to retrieve similar proteins for function annotation.
![Retriever](./asset/retriever.png)

# BibTeX
```
@article{zhang2024structureplm,
  title={Structure-Informed Protein Language Model},
  author={Zhang, Zuobai and Lu, Jiarui and Chenthamarakshan, Vijil and Lozano, Aurelie and Das, Payel and Tang, Jian},
  journal={arXiv preprint arXiv:2402.05856},
  year={2024}
}
```