Molecular Toxicity Predictor

Multi-task neural network that screens small molecules across the Tox21 panel of 12 in-vitro toxicity assays.

Model Description

Architecture: Multi-task MLP — 2048 → 1024 → 512 → 12
Input: Morgan fingerprints (ECFP4, radius=2, 2048 bits) via RDKit
Output: Probability of activity for each of 12 toxicity assays
Loss: Masked binary cross-entropy (NaN labels are excluded per task)
Metric: Mean AUC-ROC across tasks

Tox21 Assays

Task	Target
NR-AR	Androgen receptor
NR-AR-LBD	Androgen receptor ligand-binding domain
NR-AhR	Aryl hydrocarbon receptor
NR-Aromatase	Aromatase enzyme inhibition
NR-ER	Estrogen receptor alpha
NR-ER-LBD	Estrogen receptor LBD
NR-PPAR-gamma	Peroxisome proliferator-activated receptor gamma
SR-ARE	Antioxidant response element
SR-ATAD5	Genotoxicity / DNA damage (ATAD5 reporter)
SR-HSE	Heat shock response element
SR-MMP	Mitochondrial membrane potential
SR-p53	p53 tumour suppressor / DNA damage

Training Data

Dataset: Tox21 via MoleculeNet
Train: 6,264 molecules | Val: 783 | Test: 784 (80/10/10 split from 7,831 total)
Labels are sparse — ~17% NaN on average; untested entries are treated as missing, not negative

Usage

import json
import numpy as np
import torch
import torch.nn as nn
from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
from huggingface_hub import hf_hub_download

TASKS = [
    "NR-AR", "NR-AR-LBD", "NR-AhR", "NR-Aromatase",
    "NR-ER", "NR-ER-LBD", "NR-PPAR-gamma",
    "SR-ARE", "SR-ATAD5", "SR-HSE", "SR-MMP", "SR-p53",
]

class ToxMLP(nn.Module):
    def __init__(self, n_inputs=2048, n_outputs=12, dropout=0.3):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_inputs, 1024), nn.BatchNorm1d(1024), nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(1024, 512),      nn.BatchNorm1d(512),  nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(512, n_outputs),
        )
    def forward(self, x):
        return self.net(x)

model_path = hf_hub_download("Hari5115/molecular-toxicity-predictor", "best_model.pt")
model = ToxMLP()
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

fp_gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
mol    = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O")  # aspirin
fp     = torch.tensor([list(fp_gen.GetFingerprint(mol))], dtype=torch.float32)

with torch.no_grad():
    probs = torch.sigmoid(model(fp)).squeeze().numpy()

for task, p in zip(TASKS, probs):
    print(f"{task}: {p:.1%}")

Results

Test set performance (mean AUC-ROC: 0.8170):

Task	AUC-ROC
NR-AR	0.752
NR-AR-LBD	0.890
NR-AhR	0.892
NR-Aromatase	0.792
NR-ER	0.751
NR-ER-LBD	0.808
NR-PPAR-gamma	0.742
SR-ARE	0.803
SR-ATAD5	0.847
SR-HSE	0.811
SR-MMP	0.847
SR-p53	0.868

Random Forest baseline (val): 0.8306 — MLP val: 0.8544

Limitations

Morgan fingerprints capture local chemical structure but miss 3D conformation and long-range interactions.
The model is trained on a relatively small dataset (~6,000 molecules); extrapolation to novel chemical classes may be unreliable.
For research and educational purposes only. Not a substitute for certified toxicological testing.

Dataset Credit

Tox21 data provided by the NIH National Center for Advancing Translational Sciences (NCATS).
Accessed via scikit-fingerprints/MoleculeNet_Tox21 on HuggingFace.

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Hari5115/molecular-toxicity-predictor

Spaces using Hari5115/molecular-toxicity-predictor 2

Collection including Hari5115/molecular-toxicity-predictor

Molecule AI

Collection

A series of open-source ML projects for molecular safety, toxicity and odour prediction. • 7 items • Updated May 26