Spaces:
Running
Running
| title: Rasayan Tox21 Classifier | |
| emoji: ☠️ | |
| colorFrom: red | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: SNN ensemble for Tox21 toxicity prediction | |
| tags: | |
| - toxicity | |
| - tox21 | |
| - drug-discovery | |
| - chemistry | |
| - snn | |
| - molecular-property-prediction | |
| models: | |
| - rasayan-labs/rasayan-tox21-snn | |
| # Rasayan Tox21 Classifier | |
| <p align="center"> | |
| <img src="https://img.shields.io/badge/Tox21-Challenge-red" alt="Tox21"> | |
| <img src="https://img.shields.io/badge/Architecture-SNN-blue" alt="SNN"> | |
| <img src="https://img.shields.io/badge/Endpoints-12-green" alt="12 Endpoints"> | |
| <img src="https://img.shields.io/badge/License-Apache_2.0-yellow" alt="License"> | |
| </p> | |
| A production-ready **Self-Normalizing Neural Network (SNN) ensemble** for predicting molecular toxicity across the 12 Tox21 Challenge endpoints. Built for the [ml-jku Tox21 Leaderboard](https://huggingface.co/spaces/ml-jku/tox21_leaderboard). | |
| ## Model Overview | |
| | Property | Value | | |
| |----------|-------| | |
| | **Architecture** | 10-fold ensemble of SNNs | | |
| | **Parameters** | ~19M total | | |
| | **Hidden Layers** | 8 layers × 768 units | | |
| | **Activation** | SELU + AlphaDropout | | |
| | **Training** | 300 epochs, 40-fold CV | | |
| | **CV AUC** | 0.882 ± 0.021 | | |
| ## Molecular Features (11,377 total) | |
| | Feature Type | Dimensions | Description | | |
| |--------------|------------|-------------| | |
| | **ECFP6** | 8,192 | Extended-connectivity fingerprints (radius 3) | | |
| | **MACCS Keys** | 167 | Structural keys for substructure screening | | |
| | **RDKit Descriptors** | 208 | Physicochemical properties (LogP, TPSA, MW, etc.) | | |
| | **Toxicophores** | 1,868 | SMARTS-based toxicity structural alerts | | |
| | **Structural Filters** | 815 | PAINS, BRENK, NIH, ZINC filter alerts | | |
| | **Target Similarity** | 127 | Tanimoto similarity to known receptor ligands | | |
| ## Training Details | |
| - **Loss Function**: Focal Loss (γ=2.5, α=0.25) for class imbalance | |
| - **Regularization**: Label smoothing (0.1), Mixup augmentation (α=0.2) | |
| - **Feature Selection**: Variance-based selection per fold (ECFP, toxicophores) | |
| - **Normalization**: SquashScaler (StandardScaler → tanh → StandardScaler) | |
| - **Ensemble Selection**: Top-10 folds from 40-fold stratified CV | |
| ## Tox21 Endpoints | |
| ### Nuclear Receptor Panel | |
| | Endpoint | Target | Biological Significance | | |
| |----------|--------|------------------------| | |
| | **NR-AR** | Androgen Receptor | Male reproductive toxicity | | |
| | **NR-AR-LBD** | AR Ligand Binding Domain | Direct AR modulation | | |
| | **NR-AhR** | Aryl Hydrocarbon Receptor | Dioxin-like toxicity, carcinogenesis | | |
| | **NR-Aromatase** | CYP19A1 Enzyme | Estrogen synthesis disruption | | |
| | **NR-ER** | Estrogen Receptor | Endocrine disruption | | |
| | **NR-ER-LBD** | ER Ligand Binding Domain | Direct ER modulation | | |
| | **NR-PPAR-gamma** | PPARγ | Metabolic disruption | | |
| ### Stress Response Panel | |
| | Endpoint | Target | Biological Significance | | |
| |----------|--------|------------------------| | |
| | **SR-ARE** | Antioxidant Response Element | Oxidative stress | | |
| | **SR-ATAD5** | ATAD5 | DNA damage response | | |
| | **SR-HSE** | Heat Shock Element | Protein folding stress | | |
| | **SR-MMP** | Mitochondrial Membrane Potential | Mitochondrial toxicity | | |
| | **SR-p53** | Tumor Protein p53 | Genotoxicity | | |
| ## API Endpoints | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/metadata` | GET | Model configuration and capabilities | | |
| | `/predict` | POST | Toxicity predictions for SMILES | | |
| | `/health` | GET | Health check | | |
| ## Usage | |
| ### Python | |
| ```python | |
| import requests | |
| response = requests.post( | |
| "https://rasayan-labs-rasayan-tox21.hf.space/predict", | |
| json={"smiles": ["CC(=O)Nc1ccc(O)cc1", "c1ccccc1"]} | |
| ) | |
| predictions = response.json()["predictions"] | |
| for smiles, scores in predictions.items(): | |
| print(f"{smiles}:") | |
| for target, prob in sorted(scores.items(), key=lambda x: -x[1])[:3]: | |
| print(f" {target}: {prob:.1%}") | |
| ``` | |
| ### cURL | |
| ```bash | |
| curl -X POST "https://rasayan-labs-rasayan-tox21.hf.space/predict" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"smiles": ["CCO", "c1ccccc1"]}' | |
| ``` | |
| ## Response Format | |
| ```json | |
| { | |
| "predictions": { | |
| "CCO": { | |
| "NR-AR": 0.041, | |
| "NR-AR-LBD": 0.040, | |
| "NR-AhR": 0.049, | |
| "NR-Aromatase": 0.078, | |
| "NR-ER": 0.133, | |
| "NR-ER-LBD": 0.076, | |
| "NR-PPAR-gamma": 0.058, | |
| "SR-ARE": 0.100, | |
| "SR-ATAD5": 0.038, | |
| "SR-HSE": 0.066, | |
| "SR-MMP": 0.082, | |
| "SR-p53": 0.052 | |
| } | |
| }, | |
| "model_info": { | |
| "name": "Rasayan Tox21 SNN Ensemble", | |
| "version": "1.0.0" | |
| } | |
| } | |
| ``` | |
| ## Interpretation Guide | |
| | Probability | Risk Level | Recommendation | | |
| |-------------|------------|----------------| | |
| | < 0.2 | Minimal | Unlikely to be active | | |
| | 0.2 - 0.4 | Low | Monitor for chronic exposure | | |
| | 0.4 - 0.7 | Moderate | Further investigation warranted | | |
| | ≥ 0.7 | High | Strong toxicity signal | | |
| ## References | |
| - **Tox21 Challenge**: [NIH Tox21 Data Challenge](https://tripod.nih.gov/tox21/challenge/) | |
| - **SNN Architecture**: [Klambauer et al., 2017](https://arxiv.org/abs/1706.02515) | |
| - **Leaderboard**: [ml-jku Tox21 Leaderboard](https://huggingface.co/spaces/ml-jku/tox21_leaderboard) | |
| ## License | |
| Apache 2.0 | |
| --- | |
| <p align="center"> | |
| Built by <a href="https://rasayan.ai">Rasayan Labs</a> | |
| </p> | |