---
language: en
license: mit
tags:
  - toxicity
  - cheminformatics
  - nuclear-receptors
  - sklearn
  - svm
  - rdkit
  - drug-discovery
library_name: sklearn
---

# NR-ToxPred Models

Pre-trained machine learning models for predicting the binding activity of small molecules against **nine human nuclear receptors (NRs)**.

These models are used by the [NR-ToxPred GUI application](https://github.com/gokulalgates/NRToxPred-GUI) — a desktop app that requires no coding experience.

---

## What this repository contains

| Folder | Contents |
|--------|----------|
| `MODELS/morgan/` | SVM classifiers trained on Morgan (ECFP6) fingerprints — one per receptor |
| `MODELS/MACCS/` | SVM classifiers trained on MACCS Keys — one per receptor |
| `MODELS/ARclasses.npy` | Label encoder (Active / Inactive) |
| `X_train/` | Training set SMILES used for Applicability Domain assessment |

> SuperLearner ensemble models are not included here due to their size (1–1.5 GB each).

---

## Receptors covered

| Receptor | Full Name |
|----------|-----------|
| AR | Androgen Receptor |
| ERA | Estrogen Receptor Alpha |
| ERB | Estrogen Receptor Beta |
| FXR | Farnesoid X Receptor |
| GR | Glucocorticoid Receptor |
| PPARD | Peroxisome Proliferator-Activated Receptor Delta |
| PPARG | Peroxisome Proliferator-Activated Receptor Gamma |
| PR | Progesterone Receptor |
| RXR | Retinoid X Receptor |

---

## How to use

### Option A — Desktop GUI (recommended, no coding needed)

Download the NR-ToxPred GUI from GitHub and run the installer. The app will download these models automatically on first launch.

👉 **[NR-ToxPred GUI on GitHub](https://github.com/gokulalgates/NRToxPred-GUI)**

### Option B — Python (programmatic use)

```python
from huggingface_hub import hf_hub_download
import pickle, numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem

# Download a model
model_path = hf_hub_download(
    repo_id="gokulalgates/nrtoxpred-models",
    filename="MODELS/morgan/ARsvm_best.model",
    repo_type="model",
)

# Load model
model = pickle.load(open(model_path, "rb"))

# Generate Morgan fingerprint (ECFP6, 1024 bits)
mol = Chem.MolFromSmiles("CC(C)(c1ccc(O)cc1)c1ccc(O)cc1")  # bisphenol A
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=3, nBits=1024)
X = np.array(fp).reshape(1, -1)

# Predict
label_enc = {0: "Inactive", 1: "Active"}
pred = model.predict(X)[0]
print(f"AR prediction: {pred}")
```

---

## Model details

| Property | Value |
|----------|-------|
| Algorithm | Support Vector Machine (SVM) |
| Fingerprints | Morgan ECFP6 (radius=3, 1024 bits) and MACCS Keys (167 bits) |
| Framework | scikit-learn 0.23.2 |
| Task | Binary classification (Active / Inactive) |
| Applicability Domain | Tanimoto fingerprint similarity to training set |

---

## Applicability Domain

Each prediction comes with a reliability label:

- **Reliable** — the compound is similar (Tanimoto ≥ 0.25) to at least one training set compound
- **Unreliable** — the compound lies outside the training chemical space; interpret with caution

The `X_train/` folder contains the training set SMILES used to compute these assessments.

---

## Citation

If you use these models in your research, please cite:

> Predicting the binding of small molecules to nuclear receptors using machine learning.
> *Brief Bioinform.* 2022 May 13;23(3):bbac114.
> doi: [10.1093/bib/bbac114](https://doi.org/10.1093/bib/bbac114)

---

## License

MIT License