Molecular Toxicity Predictor

Multi-task neural network that screens small molecules across the Tox21 panel of 12 in-vitro toxicity assays.

Model Description

  • Architecture: Multi-task MLP โ€” 2048 โ†’ 1024 โ†’ 512 โ†’ 12
  • Input: Morgan fingerprints (ECFP4, radius=2, 2048 bits) via RDKit
  • Output: Probability of activity for each of 12 toxicity assays
  • Loss: Masked binary cross-entropy (NaN labels are excluded per task)
  • Metric: Mean AUC-ROC across tasks

Tox21 Assays

Task Target
NR-AR Androgen receptor
NR-AR-LBD Androgen receptor ligand-binding domain
NR-AhR Aryl hydrocarbon receptor
NR-Aromatase Aromatase enzyme inhibition
NR-ER Estrogen receptor alpha
NR-ER-LBD Estrogen receptor LBD
NR-PPAR-gamma Peroxisome proliferator-activated receptor gamma
SR-ARE Antioxidant response element
SR-ATAD5 Genotoxicity / DNA damage (ATAD5 reporter)
SR-HSE Heat shock response element
SR-MMP Mitochondrial membrane potential
SR-p53 p53 tumour suppressor / DNA damage

Training Data

  • Dataset: Tox21 via MoleculeNet
  • Train: 6,264 molecules | Val: 783 | Test: 784 (80/10/10 split from 7,831 total)
  • Labels are sparse โ€” ~17% NaN on average; untested entries are treated as missing, not negative

Usage

import json
import numpy as np
import torch
import torch.nn as nn
from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
from huggingface_hub import hf_hub_download

TASKS = [
    "NR-AR", "NR-AR-LBD", "NR-AhR", "NR-Aromatase",
    "NR-ER", "NR-ER-LBD", "NR-PPAR-gamma",
    "SR-ARE", "SR-ATAD5", "SR-HSE", "SR-MMP", "SR-p53",
]

class ToxMLP(nn.Module):
    def __init__(self, n_inputs=2048, n_outputs=12, dropout=0.3):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_inputs, 1024), nn.BatchNorm1d(1024), nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(1024, 512),      nn.BatchNorm1d(512),  nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(512, n_outputs),
        )
    def forward(self, x):
        return self.net(x)

model_path = hf_hub_download("Hari5115/molecular-toxicity-predictor", "best_model.pt")
model = ToxMLP()
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

fp_gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
mol    = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O")  # aspirin
fp     = torch.tensor([list(fp_gen.GetFingerprint(mol))], dtype=torch.float32)

with torch.no_grad():
    probs = torch.sigmoid(model(fp)).squeeze().numpy()

for task, p in zip(TASKS, probs):
    print(f"{task}: {p:.1%}")

Results

Test set performance (mean AUC-ROC: 0.8170):

Task AUC-ROC
NR-AR 0.752
NR-AR-LBD 0.890
NR-AhR 0.892
NR-Aromatase 0.792
NR-ER 0.751
NR-ER-LBD 0.808
NR-PPAR-gamma 0.742
SR-ARE 0.803
SR-ATAD5 0.847
SR-HSE 0.811
SR-MMP 0.847
SR-p53 0.868

Random Forest baseline (val): 0.8306 โ€” MLP val: 0.8544

Limitations

  • Morgan fingerprints capture local chemical structure but miss 3D conformation and long-range interactions.
  • The model is trained on a relatively small dataset (~6,000 molecules); extrapolation to novel chemical classes may be unreliable.
  • For research and educational purposes only. Not a substitute for certified toxicological testing.

Dataset Credit

Tox21 data provided by the NIH National Center for Advancing Translational Sciences (NCATS).
Accessed via scikit-fingerprints/MoleculeNet_Tox21 on HuggingFace.

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train Hari5115/molecular-toxicity-predictor

Space using Hari5115/molecular-toxicity-predictor 1