rasayan-tox21-snn / README.md
root
Add iframe embed and direct link
c804c10
metadata
license: apache-2.0
library_name: pytorch
pipeline_tag: other
tags:
  - chemistry
  - drug-discovery
  - toxicity
  - tox21
  - molecular-property-prediction
  - snn
  - self-normalizing-neural-network
datasets:
  - tox21
metrics:
  - roc_auc
language:
  - en

Demo

Open Interactive Demo →

Rasayan Tox21 SNN Ensemble

Tox21 SNN 19M License

A Self-Normalizing Neural Network (SNN) ensemble for predicting molecular toxicity across 12 Tox21 Challenge endpoints. Trained on the NIH Tox21 dataset with extensive feature engineering and rigorous cross-validation.

Model Description

This model predicts the probability of a molecule being active (toxic) against 12 biological targets from the Tox21 Challenge:

Nuclear Receptor Panel

Endpoint Target Description
NR-AR Androgen Receptor Male reproductive toxicity
NR-AR-LBD AR Ligand Binding Domain Direct AR modulation
NR-AhR Aryl Hydrocarbon Receptor Dioxin-like toxicity
NR-Aromatase CYP19A1 Estrogen synthesis disruption
NR-ER Estrogen Receptor Endocrine disruption
NR-ER-LBD ER Ligand Binding Domain Direct ER modulation
NR-PPAR-gamma PPARγ Metabolic disruption

Stress Response Panel

Endpoint Target Description
SR-ARE Antioxidant Response Element Oxidative stress
SR-ATAD5 ATAD5 DNA damage response
SR-HSE Heat Shock Element Protein folding stress
SR-MMP Mitochondrial Membrane Potential Mitochondrial toxicity
SR-p53 Tumor Protein p53 Genotoxicity

Architecture

Component Specification
Type Self-Normalizing Neural Network
Ensemble 10 models (top from 40-fold CV)
Hidden Layers 8 layers × 768 units
Activation SELU
Regularization AlphaDropout (0.1)
Output Sigmoid (12 endpoints)
Parameters ~19M total

Molecular Features (11,377 dimensions)

Feature Type Dimensions Description
ECFP6 8,192 Extended-connectivity fingerprints (radius 3)
MACCS Keys 167 Structural keys for substructure screening
RDKit Descriptors 208 Physicochemical properties
Toxicophores 1,868 SMARTS-based toxicity alerts
Structural Filters 815 PAINS, BRENK, NIH, ZINC alerts
Target Similarity 127 Tanimoto similarity to known ligands

Training

Parameter Value
Dataset Tox21 Challenge (7,831 compounds)
Validation 40-fold Stratified CV
Epochs 300
Batch Size 256
Optimizer AdamW (lr=1e-4, weight_decay=0.01)
Loss Focal Loss (γ=2.5, α=0.25)
Regularization Label Smoothing (0.1), Mixup (α=0.2)
CV AUC 0.882 ± 0.021

Usage

With the Inference API

import requests

response = requests.post(
    "https://rasayan-labs-rasayan-tox21.hf.space/predict",
    json={"smiles": ["CCO", "c1ccccc1"]}
)

predictions = response.json()["predictions"]
for smiles, scores in predictions.items():
    print(f"{smiles}:")
    for target, prob in sorted(scores.items(), key=lambda x: -x[1])[:3]:
        print(f"  {target}: {prob:.1%}")

Direct Model Loading

import torch
import json

checkpoint = torch.load("ensemble.pt", map_location="cpu")

scalers = checkpoint["scalers"]
feature_indices = checkpoint["feature_indices"]
models = checkpoint["models"]

print(f"Loaded {len(models)} ensemble members")

Files

File Description
ensemble.pt PyTorch checkpoint with 10 models + scalers
config.json Model configuration
toxicophores_validated.json 1,868 toxicophore SMARTS patterns
target_ligands_validated.json Reference ligands for similarity

Intended Use

This model is intended for:

  • Early-stage drug discovery toxicity screening
  • Prioritization of compounds for experimental testing
  • Educational purposes in computational toxicology

Limitations

  • Trained on Tox21 assay data which may not capture all toxicity mechanisms
  • Performance may vary for chemical spaces outside the training domain
  • Should not replace experimental validation

Citation

@misc{rasayan-tox21-2026,
  author = {Rasayan Labs},
  title = {Rasayan Tox21 SNN Ensemble},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/rasayan-labs/rasayan-tox21-snn}
}

License

Apache 2.0


Built by Rasayan Labs