scikit-fingerprints/MoleculeNet_Tox21
Viewer โข Updated โข 7.83k โข 704
Multi-task neural network that screens small molecules across the Tox21 panel of 12 in-vitro toxicity assays.
| Task | Target |
|---|---|
| NR-AR | Androgen receptor |
| NR-AR-LBD | Androgen receptor ligand-binding domain |
| NR-AhR | Aryl hydrocarbon receptor |
| NR-Aromatase | Aromatase enzyme inhibition |
| NR-ER | Estrogen receptor alpha |
| NR-ER-LBD | Estrogen receptor LBD |
| NR-PPAR-gamma | Peroxisome proliferator-activated receptor gamma |
| SR-ARE | Antioxidant response element |
| SR-ATAD5 | Genotoxicity / DNA damage (ATAD5 reporter) |
| SR-HSE | Heat shock response element |
| SR-MMP | Mitochondrial membrane potential |
| SR-p53 | p53 tumour suppressor / DNA damage |
import json
import numpy as np
import torch
import torch.nn as nn
from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
from huggingface_hub import hf_hub_download
TASKS = [
"NR-AR", "NR-AR-LBD", "NR-AhR", "NR-Aromatase",
"NR-ER", "NR-ER-LBD", "NR-PPAR-gamma",
"SR-ARE", "SR-ATAD5", "SR-HSE", "SR-MMP", "SR-p53",
]
class ToxMLP(nn.Module):
def __init__(self, n_inputs=2048, n_outputs=12, dropout=0.3):
super().__init__()
self.net = nn.Sequential(
nn.Linear(n_inputs, 1024), nn.BatchNorm1d(1024), nn.ReLU(), nn.Dropout(dropout),
nn.Linear(1024, 512), nn.BatchNorm1d(512), nn.ReLU(), nn.Dropout(dropout),
nn.Linear(512, n_outputs),
)
def forward(self, x):
return self.net(x)
model_path = hf_hub_download("Hari5115/molecular-toxicity-predictor", "best_model.pt")
model = ToxMLP()
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
fp_gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
mol = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O") # aspirin
fp = torch.tensor([list(fp_gen.GetFingerprint(mol))], dtype=torch.float32)
with torch.no_grad():
probs = torch.sigmoid(model(fp)).squeeze().numpy()
for task, p in zip(TASKS, probs):
print(f"{task}: {p:.1%}")
Test set performance (mean AUC-ROC: 0.8170):
| Task | AUC-ROC |
|---|---|
| NR-AR | 0.752 |
| NR-AR-LBD | 0.890 |
| NR-AhR | 0.892 |
| NR-Aromatase | 0.792 |
| NR-ER | 0.751 |
| NR-ER-LBD | 0.808 |
| NR-PPAR-gamma | 0.742 |
| SR-ARE | 0.803 |
| SR-ATAD5 | 0.847 |
| SR-HSE | 0.811 |
| SR-MMP | 0.847 |
| SR-p53 | 0.868 |
Random Forest baseline (val): 0.8306 โ MLP val: 0.8544
Tox21 data provided by the NIH National Center for Advancing Translational Sciences (NCATS).
Accessed via scikit-fingerprints/MoleculeNet_Tox21 on HuggingFace.
MIT