File size: 1,963 Bytes
df718d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c020ed
113736b
df718d6
6c020ed
113736b
89733d0
df718d6
7219262
050a603
113736b
050a603
6c020ed
050a603
332b305
050a603
 
df718d6
 
6c020ed
df718d6
 
 
 
 
 
 
 
6c020ed
df718d6
6c020ed
8e3c21f
df718d6
 
ea73321
df718d6
ea73321
df718d6
6c020ed
8e3c21f
df718d6
 
 
8e3c21f
201f12a
df718d6
8e3c21f
 
df718d6
 
8e3c21f
df718d6
8e3c21f
1a7eab4
8e3c21f
1a7eab4
8e3c21f
1a7eab4
8e3c21f
df718d6
 
6c020ed
df718d6
 
 
7210511
df718d6
 
 
 
 
 
 
6c020ed
df718d6
 
 
 
 
6c020ed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: mit
metrics:
- mae
base_model:
- facebook/esm2_t33_650M_UR50D
pipeline_tag: tabular-regression
tags:
- PLM
- GBT
- ESM2
- Regression
---



## BindPred: Gradient Boosted Trees on ESM2 Embeddings

# Model Overview
The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks.
Pretrained Colab Notebook:https://colab.research.google.com/drive/1ndzICxVBUUBHffmi0KDtUXaKaMtqTz55

# Available Pretrianed Models:

ACE2_RBD_BindPred.json	 

Predicts binding affinity between ACE2 (human and animals) and RBD proteins.

ESM2_BindPred.json	

General-purpose GBT model trained on ESM2 embeddings.


# Model Details
•	Base Model: ESM2

•	Architecture: Gradient Boosted Trees (CatBoostRegressor)

•	Framework: CatBoost

•	Task: Regression

# How to Use

Download Model from Hugging Face

from huggingface_hub import hf_hub_download

# Download General model

model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ESM2_BindPred.cbm")

Load Model in CatBoost

from catboost import CatBoostRegressor

model = CatBoostRegressor()

model.load_model(model_path, format="cbm")


# Training Details

•	Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params)

•	Training Algorithm: CatBoost Gradient Boosting

•	Dataset: 

      ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey
      
      General: https://zenodo.org/records/14271435
      
•	Evaluation Metrics: RMSE, R^2

# Applications

•	Binding affinity predictions

# Limitations & Considerations

•	The model is trained on ESM2 embeddings and is limited by the quality of those embeddings.

•	Performance depends on the training dataset used.

•	Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions.

# Citation

👤 Maintainer: hbp5181@psu.edu

📅 Last Updated: February 2025