hbp5181
/

BindPred

+---
+license: mit
+metrics:
+- mae
+base_model:
+- facebook/esm2_t33_650M_UR50D
+pipeline_tag: tabular-regression
+tags:
+- PLM
+- GBT
+- ESM2
+- Regression
+---
+ESM2-GBT: Gradient Boosted Trees on ESM2 Embeddings
+📌 Model Overview
+The ESM2-GBT model is a Gradient Boosted Trees (GBT) regressor trained on ESM2 embeddings from Meta’s ESM2 protein language model. It is designed for protein-related predictive tasks.
+🧪 Available Models:
+Model Name	Description
+ACE2_RBD_ESM2-GBT.json	Predicts binding affinity between ACE2 and RBD proteins.
+General_ESM2-GBT.json	General-purpose GBT model trained on ESM2 embeddings.
+🏗 Model Details
+•	Base Model: ESM2
+•	Architecture: Gradient Boosted Trees (CatBoostRegressor)
+•	Framework: CatBoost
+•	Task: Regression
+🧑‍💻 How to Use
+1️⃣ Download Model from Hugging Face
+from huggingface_hub import hf_hub_download
+# Download ACE2 RBD model/General model
+model_path = hf_hub_download(repo_id="hbp5181/ESM2-GBT", filename="ACE2_RBD_ESM2-GBT.json")
+2️⃣ Load Model in CatBoost
+from catboost import CatBoostRegressor
+model = CatBoostRegressor()
+model.load_model(model_path, format="json")
+# Predictions using your own dataset!
+🔬 Training Details
+•	Feature Extraction: ESM2 embeddings (33-layer transformer, 650M params)
+•	Training Algorithm: CatBoost Gradient Boosting
+•	Dataset: your own dataset
+•	Evaluation Metrics: RMSE, R^2
+📌 Applications
+•	Binding affinity predictions
+💡 Limitations & Considerations
+•	The model is trained on ESM2 embeddings and is limited by the quality of those embeddings.
+•	Performance depends on the training dataset used.
+•	Not a deep-learning model; instead, it leverages GBTs for fast, interpretable predictions.
+📄 Citation
+👤 Maintainer: hbp5181@psu.edu
+📅 Last Updated: February 2025