hbp5181 commited on
Commit
df718d6
·
verified ·
1 Parent(s): 53f810a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ metrics:
4
+ - mae
5
+ base_model:
6
+ - facebook/esm2_t33_650M_UR50D
7
+ pipeline_tag: tabular-regression
8
+ tags:
9
+ - PLM
10
+ - GBT
11
+ - ESM2
12
+ - Regression
13
+ ---
14
+
15
+
16
+ ESM2-GBT: Gradient Boosted Trees on ESM2 Embeddings
17
+
18
+ 📌 Model Overview
19
+ The ESM2-GBT model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for protein-related predictive tasks.
20
+
21
+
22
+ 🧪 Available Models:
23
+ Model Name Description
24
+ ACE2_RBD_ESM2-GBT.json Predicts binding affinity between ACE2 and RBD proteins.
25
+ General_ESM2-GBT.json General-purpose GBT model trained on ESM2 embeddings.
26
+
27
+
28
+ 🏗 Model Details
29
+ • Base Model: ESM2
30
+
31
+ • Architecture: Gradient Boosted Trees (CatBoostRegressor)
32
+
33
+ • Framework: CatBoost
34
+
35
+ • Task: Regression
36
+
37
+ 🧑‍💻 How to Use
38
+
39
+ 1️⃣ Download Model from Hugging Face
40
+ from huggingface_hub import hf_hub_download
41
+
42
+ # Download ACE2 RBD model/General model
43
+
44
+ model_path = hf_hub_download(repo_id="hbp5181/ESM2-GBT", filename="ACE2_RBD_ESM2-GBT.json")
45
+
46
+ 2️⃣ Load Model in CatBoost
47
+ from catboost import CatBoostRegressor
48
+
49
+ model = CatBoostRegressor()
50
+ model.load_model(model_path, format="json")
51
+
52
+ # Predictions using your own dataset!
53
+ 🔬 Training Details
54
+
55
+ • Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params)
56
+ • Training Algorithm: CatBoost Gradient Boosting
57
+ • Dataset: your own dataset
58
+ • Evaluation Metrics: RMSE, R^2
59
+
60
+ 📌 Applications
61
+
62
+ • Binding affinity predictions
63
+
64
+ 💡 Limitations & Considerations
65
+
66
+ • The model is trained on ESM2 embeddings and is limited by the quality of those embeddings.
67
+
68
+ • Performance depends on the training dataset used.
69
+
70
+ • Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions.
71
+
72
+ 📄 Citation
73
+
74
+ 👤 Maintainer: hbp5181@psu.edu
75
+
76
+ 📅 Last Updated: February 2025
77
+