File size: 1,963 Bytes
df718d6 6c020ed 113736b df718d6 6c020ed 113736b 89733d0 df718d6 7219262 050a603 113736b 050a603 6c020ed 050a603 332b305 050a603 df718d6 6c020ed df718d6 6c020ed df718d6 6c020ed 8e3c21f df718d6 ea73321 df718d6 ea73321 df718d6 6c020ed 8e3c21f df718d6 8e3c21f 201f12a df718d6 8e3c21f df718d6 8e3c21f df718d6 8e3c21f 1a7eab4 8e3c21f 1a7eab4 8e3c21f 1a7eab4 8e3c21f df718d6 6c020ed df718d6 7210511 df718d6 6c020ed df718d6 6c020ed |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
---
license: mit
metrics:
- mae
base_model:
- facebook/esm2_t33_650M_UR50D
pipeline_tag: tabular-regression
tags:
- PLM
- GBT
- ESM2
- Regression
---
## BindPred: Gradient Boosted Trees on ESM2 Embeddings
# Model Overview
The BindPred model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for binding affinity predictive tasks.
Pretrained Colab Notebook:https://colab.research.google.com/drive/1ndzICxVBUUBHffmi0KDtUXaKaMtqTz55
# Available Pretrianed Models:
ACE2_RBD_BindPred.json
Predicts binding affinity between ACE2 (human and animals) and RBD proteins.
ESM2_BindPred.json
General-purpose GBT model trained on ESM2 embeddings.
# Model Details
• Base Model: ESM2
• Architecture: Gradient Boosted Trees (CatBoostRegressor)
• Framework: CatBoost
• Task: Regression
# How to Use
Download Model from Hugging Face
from huggingface_hub import hf_hub_download
# Download General model
model_path = hf_hub_download(repo_id="hbp5181/BindPred", filename="ESM2_BindPred.cbm")
Load Model in CatBoost
from catboost import CatBoostRegressor
model = CatBoostRegressor()
model.load_model(model_path, format="cbm")
# Training Details
• Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params)
• Training Algorithm: CatBoost Gradient Boosting
• Dataset:
ACE2 RBD: https://github.com/jbloomlab/SARSr-CoV_homolog_survey
General: https://zenodo.org/records/14271435
• Evaluation Metrics: RMSE, R^2
# Applications
• Binding affinity predictions
# Limitations & Considerations
• The model is trained on ESM2 embeddings and is limited by the quality of those embeddings.
• Performance depends on the training dataset used.
• Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions.
# Citation
👤 Maintainer: hbp5181@psu.edu
📅 Last Updated: February 2025
|