File size: 5,275 Bytes
0024d0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9db4040
 
0024d0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0084cd1
0024d0e
 
 
 
 
 
 
0084cd1
 
0024d0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1843b40
0024d0e
 
 
 
 
 
 
 
 
 
 
 
1843b40
0024d0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: Rasayan Tox21 Classifier
emoji: ☠️
colorFrom: red
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: apache-2.0
short_description: SNN ensemble for Tox21 toxicity prediction
tags:
  - toxicity
  - tox21
  - drug-discovery
  - chemistry
  - snn
  - molecular-property-prediction
models:
  - rasayan-labs/rasayan-tox21-snn
---

# Rasayan Tox21 Classifier

<p align="center">
  <img src="https://img.shields.io/badge/Tox21-Challenge-red" alt="Tox21">
  <img src="https://img.shields.io/badge/Architecture-SNN-blue" alt="SNN">
  <img src="https://img.shields.io/badge/Endpoints-12-green" alt="12 Endpoints">
  <img src="https://img.shields.io/badge/License-Apache_2.0-yellow" alt="License">
</p>

A production-ready **Self-Normalizing Neural Network (SNN) ensemble** for predicting molecular toxicity across the 12 Tox21 Challenge endpoints. Built for the [ml-jku Tox21 Leaderboard](https://huggingface.co/spaces/ml-jku/tox21_leaderboard).

## Model Overview

| Property | Value |
|----------|-------|
| **Architecture** | 10-fold ensemble of SNNs |
| **Parameters** | ~19M total |
| **Hidden Layers** | 8 layers × 768 units |
| **Activation** | SELU + AlphaDropout |
| **Training** | 300 epochs, 40-fold CV |
| **CV AUC** | 0.882 ± 0.021 |

## Molecular Features (11,377 total)

| Feature Type | Dimensions | Description |
|--------------|------------|-------------|
| **ECFP6** | 8,192 | Extended-connectivity fingerprints (radius 3) |
| **MACCS Keys** | 167 | Structural keys for substructure screening |
| **RDKit Descriptors** | 208 | Physicochemical properties (LogP, TPSA, MW, etc.) |
| **Toxicophores** | 1,868 | SMARTS-based toxicity structural alerts |
| **Structural Filters** | 815 | PAINS, BRENK, NIH, ZINC filter alerts |
| **Target Similarity** | 127 | Tanimoto similarity to known receptor ligands |

## Training Details

- **Loss Function**: Focal Loss (γ=2.5, α=0.25) for class imbalance
- **Regularization**: Label smoothing (0.1), Mixup augmentation (α=0.2)
- **Feature Selection**: Variance-based selection per fold (ECFP, toxicophores)
- **Normalization**: SquashScaler (StandardScaler → tanh → StandardScaler)
- **Ensemble Selection**: Top-10 folds from 40-fold stratified CV

## Tox21 Endpoints

### Nuclear Receptor Panel
| Endpoint | Target | Biological Significance |
|----------|--------|------------------------|
| **NR-AR** | Androgen Receptor | Male reproductive toxicity |
| **NR-AR-LBD** | AR Ligand Binding Domain | Direct AR modulation |
| **NR-AhR** | Aryl Hydrocarbon Receptor | Dioxin-like toxicity, carcinogenesis |
| **NR-Aromatase** | CYP19A1 Enzyme | Estrogen synthesis disruption |
| **NR-ER** | Estrogen Receptor | Endocrine disruption |
| **NR-ER-LBD** | ER Ligand Binding Domain | Direct ER modulation |
| **NR-PPAR-gamma** | PPARγ | Metabolic disruption |

### Stress Response Panel
| Endpoint | Target | Biological Significance |
|----------|--------|------------------------|
| **SR-ARE** | Antioxidant Response Element | Oxidative stress |
| **SR-ATAD5** | ATAD5 | DNA damage response |
| **SR-HSE** | Heat Shock Element | Protein folding stress |
| **SR-MMP** | Mitochondrial Membrane Potential | Mitochondrial toxicity |
| **SR-p53** | Tumor Protein p53 | Genotoxicity |

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/metadata` | GET | Model configuration and capabilities |
| `/predict` | POST | Toxicity predictions for SMILES |
| `/health` | GET | Health check |

## Usage

### Python
```python
import requests

response = requests.post(
    "https://rasayan-labs-rasayan-tox21.hf.space/predict",
    json={"smiles": ["CC(=O)Nc1ccc(O)cc1", "c1ccccc1"]}
)

predictions = response.json()["predictions"]
for smiles, scores in predictions.items():
    print(f"{smiles}:")
    for target, prob in sorted(scores.items(), key=lambda x: -x[1])[:3]:
        print(f"  {target}: {prob:.1%}")
```

### cURL
```bash
curl -X POST "https://rasayan-labs-rasayan-tox21.hf.space/predict" \
  -H "Content-Type: application/json" \
  -d '{"smiles": ["CCO", "c1ccccc1"]}'
```

## Response Format

```json
{
  "predictions": {
    "CCO": {
      "NR-AR": 0.041,
      "NR-AR-LBD": 0.040,
      "NR-AhR": 0.049,
      "NR-Aromatase": 0.078,
      "NR-ER": 0.133,
      "NR-ER-LBD": 0.076,
      "NR-PPAR-gamma": 0.058,
      "SR-ARE": 0.100,
      "SR-ATAD5": 0.038,
      "SR-HSE": 0.066,
      "SR-MMP": 0.082,
      "SR-p53": 0.052
    }
  },
  "model_info": {
    "name": "Rasayan Tox21 SNN Ensemble",
    "version": "1.0.0"
  }
}
```

## Interpretation Guide

| Probability | Risk Level | Recommendation |
|-------------|------------|----------------|
| < 0.2 | Minimal | Unlikely to be active |
| 0.2 - 0.4 | Low | Monitor for chronic exposure |
| 0.4 - 0.7 | Moderate | Further investigation warranted |
| ≥ 0.7 | High | Strong toxicity signal |

## References

- **Tox21 Challenge**: [NIH Tox21 Data Challenge](https://tripod.nih.gov/tox21/challenge/)
- **SNN Architecture**: [Klambauer et al., 2017](https://arxiv.org/abs/1706.02515)
- **Leaderboard**: [ml-jku Tox21 Leaderboard](https://huggingface.co/spaces/ml-jku/tox21_leaderboard)

## License

Apache 2.0

---

<p align="center">
  Built by <a href="https://rasayan.ai">Rasayan Labs</a>
</p>