root

Add iframe embed and direct link

c804c10 3 months ago

5.14 kB

license: apache-2.0
library_name: pytorch
pipeline_tag: other
tags:
  - chemistry
  - drug-discovery
  - toxicity
  - tox21
  - molecular-property-prediction
  - snn
  - self-normalizing-neural-network
datasets:
  - tox21
metrics:
  - roc_auc
language:
  - en

Demo

Open Interactive Demo →

Rasayan Tox21 SNN Ensemble

A Self-Normalizing Neural Network (SNN) ensemble for predicting molecular toxicity across 12 Tox21 Challenge endpoints. Trained on the NIH Tox21 dataset with extensive feature engineering and rigorous cross-validation.

Model Description

This model predicts the probability of a molecule being active (toxic) against 12 biological targets from the Tox21 Challenge:

Nuclear Receptor Panel

Endpoint	Target	Description
NR-AR	Androgen Receptor	Male reproductive toxicity
NR-AR-LBD	AR Ligand Binding Domain	Direct AR modulation
NR-AhR	Aryl Hydrocarbon Receptor	Dioxin-like toxicity
NR-Aromatase	CYP19A1	Estrogen synthesis disruption
NR-ER	Estrogen Receptor	Endocrine disruption
NR-ER-LBD	ER Ligand Binding Domain	Direct ER modulation
NR-PPAR-gamma	PPARγ	Metabolic disruption

Stress Response Panel

Endpoint	Target	Description
SR-ARE	Antioxidant Response Element	Oxidative stress
SR-ATAD5	ATAD5	DNA damage response
SR-HSE	Heat Shock Element	Protein folding stress
SR-MMP	Mitochondrial Membrane Potential	Mitochondrial toxicity
SR-p53	Tumor Protein p53	Genotoxicity

Architecture

Component	Specification
Type	Self-Normalizing Neural Network
Ensemble	10 models (top from 40-fold CV)
Hidden Layers	8 layers × 768 units
Activation	SELU
Regularization	AlphaDropout (0.1)
Output	Sigmoid (12 endpoints)
Parameters	~19M total

Molecular Features (11,377 dimensions)

Feature Type	Dimensions	Description
ECFP6	8,192	Extended-connectivity fingerprints (radius 3)
MACCS Keys	167	Structural keys for substructure screening
RDKit Descriptors	208	Physicochemical properties
Toxicophores	1,868	SMARTS-based toxicity alerts
Structural Filters	815	PAINS, BRENK, NIH, ZINC alerts
Target Similarity	127	Tanimoto similarity to known ligands

Training

Parameter	Value
Dataset	Tox21 Challenge (7,831 compounds)
Validation	40-fold Stratified CV
Epochs	300
Batch Size	256
Optimizer	AdamW (lr=1e-4, weight_decay=0.01)
Loss	Focal Loss (γ=2.5, α=0.25)
Regularization	Label Smoothing (0.1), Mixup (α=0.2)
CV AUC	0.882 ± 0.021

Usage

With the Inference API

import requests

response = requests.post(
    "https://rasayan-labs-rasayan-tox21.hf.space/predict",
    json={"smiles": ["CCO", "c1ccccc1"]}
)

predictions = response.json()["predictions"]
for smiles, scores in predictions.items():
    print(f"{smiles}:")
    for target, prob in sorted(scores.items(), key=lambda x: -x[1])[:3]:
        print(f"  {target}: {prob:.1%}")

Direct Model Loading

import torch
import json

checkpoint = torch.load("ensemble.pt", map_location="cpu")

scalers = checkpoint["scalers"]
feature_indices = checkpoint["feature_indices"]
models = checkpoint["models"]

print(f"Loaded {len(models)} ensemble members")

Files

File	Description
`ensemble.pt`	PyTorch checkpoint with 10 models + scalers
`config.json`	Model configuration
`toxicophores_validated.json`	1,868 toxicophore SMARTS patterns
`target_ligands_validated.json`	Reference ligands for similarity

Intended Use

This model is intended for:

Early-stage drug discovery toxicity screening
Prioritization of compounds for experimental testing
Educational purposes in computational toxicology

Limitations

Trained on Tox21 assay data which may not capture all toxicity mechanisms
Performance may vary for chemical spaces outside the training domain
Should not replace experimental validation

Citation

@misc{rasayan-tox21-2026,
  author = {Rasayan Labs},
  title = {Rasayan Tox21 SNN Ensemble},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/rasayan-labs/rasayan-tox21-snn}
}

License

Apache 2.0

Built by Rasayan Labs