File size: 3,477 Bytes
93ec7ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
language: en
license: mit
tags:
  - toxicity
  - cheminformatics
  - nuclear-receptors
  - sklearn
  - svm
  - rdkit
  - drug-discovery
library_name: sklearn
---

# NR-ToxPred Models

Pre-trained machine learning models for predicting the binding activity of small molecules against **nine human nuclear receptors (NRs)**.

These models are used by the [NR-ToxPred GUI application](https://github.com/gokulalgates/NRToxPred-GUI) β€” a desktop app that requires no coding experience.

---

## What this repository contains

| Folder | Contents |
|--------|----------|
| `MODELS/morgan/` | SVM classifiers trained on Morgan (ECFP6) fingerprints β€” one per receptor |
| `MODELS/MACCS/` | SVM classifiers trained on MACCS Keys β€” one per receptor |
| `MODELS/ARclasses.npy` | Label encoder (Active / Inactive) |
| `X_train/` | Training set SMILES used for Applicability Domain assessment |

> SuperLearner ensemble models are not included here due to their size (1–1.5 GB each).

---

## Receptors covered

| Receptor | Full Name |
|----------|-----------|
| AR | Androgen Receptor |
| ERA | Estrogen Receptor Alpha |
| ERB | Estrogen Receptor Beta |
| FXR | Farnesoid X Receptor |
| GR | Glucocorticoid Receptor |
| PPARD | Peroxisome Proliferator-Activated Receptor Delta |
| PPARG | Peroxisome Proliferator-Activated Receptor Gamma |
| PR | Progesterone Receptor |
| RXR | Retinoid X Receptor |

---

## How to use

### Option A β€” Desktop GUI (recommended, no coding needed)

Download the NR-ToxPred GUI from GitHub and run the installer. The app will download these models automatically on first launch.

πŸ‘‰ **[NR-ToxPred GUI on GitHub](https://github.com/gokulalgates/NRToxPred-GUI)**

### Option B β€” Python (programmatic use)

```python
from huggingface_hub import hf_hub_download
import pickle, numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem

# Download a model
model_path = hf_hub_download(
    repo_id="gokulalgates/nrtoxpred-models",
    filename="MODELS/morgan/ARsvm_best.model",
    repo_type="model",
)

# Load model
model = pickle.load(open(model_path, "rb"))

# Generate Morgan fingerprint (ECFP6, 1024 bits)
mol = Chem.MolFromSmiles("CC(C)(c1ccc(O)cc1)c1ccc(O)cc1")  # bisphenol A
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=3, nBits=1024)
X = np.array(fp).reshape(1, -1)

# Predict
label_enc = {0: "Inactive", 1: "Active"}
pred = model.predict(X)[0]
print(f"AR prediction: {pred}")
```

---

## Model details

| Property | Value |
|----------|-------|
| Algorithm | Support Vector Machine (SVM) |
| Fingerprints | Morgan ECFP6 (radius=3, 1024 bits) and MACCS Keys (167 bits) |
| Framework | scikit-learn 0.23.2 |
| Task | Binary classification (Active / Inactive) |
| Applicability Domain | Tanimoto fingerprint similarity to training set |

---

## Applicability Domain

Each prediction comes with a reliability label:

- **Reliable** β€” the compound is similar (Tanimoto β‰₯ 0.25) to at least one training set compound
- **Unreliable** β€” the compound lies outside the training chemical space; interpret with caution

The `X_train/` folder contains the training set SMILES used to compute these assessments.

---

## Citation

If you use these models in your research, please cite:

> Predicting the binding of small molecules to nuclear receptors using machine learning.
> *Brief Bioinform.* 2022 May 13;23(3):bbac114.
> doi: [10.1093/bib/bbac114](https://doi.org/10.1093/bib/bbac114)

---

## License

MIT License