eve-bio/drug-target-activity
Viewer • Updated • 583k • 197 • 38
How to use pageman/discovery2-cytotoxicity-models with Scikit-learn:
from huggingface_hub import hf_hub_download
import joblib
model = joblib.load(
hf_hub_download("pageman/discovery2-cytotoxicity-models", "sklearn_model.joblib")
)
# only load pickle files from sources you trust
# read more about it here https://skops.readthedocs.io/en/stable/persistence.htmlPre-trained models for predicting drug cytotoxicity based on promiscuity and molecular structure.
This repository contains trained models from the Discovery 2 study on selectivity-safety coupling. The models predict cytotoxicity risk based on:
Main Model: cubic_logistic_model.pkl
Class-Specific Models:
kinase_cubic_model.pkl - For kinase promiscuity (50% threshold: 25 hits)nr_cubic_model.pkl - For nuclear receptor promiscuity (50% threshold: 31 hits)7tm_cubic_model.pkl - For GPCR/7TM promiscuity (50% threshold: 63 hits)File: lightgbm_model.txt
cubic_model_metadata.json - Performance metrics for main cubic modelclass_models_metadata.json - Thresholds for class-specific modelslgb_model_metadata.json - LightGBM model performancefeature_stats.json - Feature statistics for normalizationpip install joblib lightgbm rdkit numpy pandas statsmodels
import joblib
import lightgbm as lgb
import json
# Load cubic logistic regression model
cubic_model = joblib.load('cubic_logistic_model.pkl')
# Load LightGBM model
lgb_model = lgb.Booster(model_file='lightgbm_model.txt')
# Load metadata
with open('cubic_model_metadata.json', 'r') as f:
cubic_metadata = json.load(f)
with open('feature_stats.json', 'r') as f:
feature_stats = json.load(f)
import numpy as np
from statsmodels.tools import add_constant
def predict_cytotoxicity_from_promiscuity(promiscuity_score, model):
"""
Predict cytotoxicity probability from promiscuity score
Args:
promiscuity_score: Number of active assays (hits)
model: Loaded cubic logistic regression model
Returns:
Probability of cytotoxicity (0-1)
"""
# Create cubic features
X = np.array([[promiscuity_score,
promiscuity_score**2,
promiscuity_score**3]])
X_with_const = add_constant(X)
# Predict probability
prob = model.predict(X_with_const)[0]
return prob
# Example usage
promiscuity = 50
prob = predict_cytotoxicity_from_promiscuity(promiscuity, cubic_model)
print(f"Promiscuity: {promiscuity} hits")
print(f"Cytotoxicity probability: {prob:.2%}")
from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np
def predict_cytotoxicity_from_smiles(smiles, model):
"""
Predict cytotoxicity from SMILES string
Args:
smiles: SMILES representation of molecule
model: Loaded LightGBM model
Returns:
Probability of cytotoxicity (0-1)
"""
# Generate Morgan fingerprint
mol = Chem.MolFromSmiles(smiles)
if mol is None:
raise ValueError("Invalid SMILES")
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048)
fp_array = np.array(fp).reshape(1, -1)
# Predict
prob = model.predict(fp_array)[0]
return prob
# Example usage
smiles = "CC(C)Cc1ccc(cc1)C(C)C(O)=O" # Ibuprofen
prob = predict_cytotoxicity_from_smiles(smiles, lgb_model)
print(f"SMILES: {smiles}")
print(f"Cytotoxicity probability: {prob:.2%}")
# Load class-specific model
kinase_model = joblib.load('kinase_cubic_model.pkl')
# Predict from kinase-specific promiscuity
kinase_hits = 20
prob = predict_cytotoxicity_from_promiscuity(kinase_hits, kinase_model)
print(f"Kinase promiscuity: {kinase_hits} hits")
print(f"Cytotoxicity probability: {prob:.2%}")
Based on the cubic model thresholds:
| Promiscuity Range | Risk Level | Cytotoxicity Probability |
|---|---|---|
| < 43 hits | Low | < 25% |
| 43-102 hits | Moderate | 25-75% |
| > 102 hits | High | > 75% |
Class-Specific 50% Thresholds:
If you use these models in your research, please cite:
Discovery 2: Cytotoxicity Prediction Models
Models: https://huggingface.co/pageman/discovery2-cytotoxicity-models
Dataset: https://huggingface.co/datasets/pageman/discovery2-results
These models are provided for research purposes under CC-BY-NC-SA-4.0 license. Please check with the original data sources for licensing terms.
For questions or issues, please open a discussion on this repository.