root commited on
Commit
0024d0e
·
0 Parent(s):

Initial commit: Rasayan Tox21 SNN Ensemble

Browse files

- 10-fold SNN ensemble for Tox21 toxicity prediction
- 11,369 molecular features (ECFP6, MACCS, RDKit, toxicophores, similarity)
- FastAPI with /metadata and /predict endpoints
- 40-fold CV AUC: 0.882

.gitattributes ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ *.pt filter=lfs diff=lfs merge=lfs -text
2
+ checkpoints/ensemble.pt filter=lfs diff=lfs merge=lfs -text
Dockerfile ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ RUN apt-get update && apt-get install -y \
6
+ libxrender1 \
7
+ libxext6 \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ COPY requirements.txt .
11
+ RUN pip install --no-cache-dir -r requirements.txt
12
+
13
+ COPY . .
14
+
15
+ RUN useradd -m -u 1000 user
16
+ USER user
17
+
18
+ ENV HOME=/home/user \
19
+ PATH=/home/user/.local/bin:$PATH
20
+
21
+ EXPOSE 7860
22
+
23
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Rasayan Tox21 Classifier
3
+ emoji: ☠️
4
+ colorFrom: red
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ license: apache-2.0
10
+ short_description: SNN ensemble for Tox21 toxicity prediction
11
+ tags:
12
+ - toxicity
13
+ - tox21
14
+ - drug-discovery
15
+ - chemistry
16
+ - snn
17
+ - molecular-property-prediction
18
+ ---
19
+
20
+ # Rasayan Tox21 Classifier
21
+
22
+ <p align="center">
23
+ <img src="https://img.shields.io/badge/Tox21-Challenge-red" alt="Tox21">
24
+ <img src="https://img.shields.io/badge/Architecture-SNN-blue" alt="SNN">
25
+ <img src="https://img.shields.io/badge/Endpoints-12-green" alt="12 Endpoints">
26
+ <img src="https://img.shields.io/badge/License-Apache_2.0-yellow" alt="License">
27
+ </p>
28
+
29
+ A production-ready **Self-Normalizing Neural Network (SNN) ensemble** for predicting molecular toxicity across the 12 Tox21 Challenge endpoints. Built for the [ml-jku Tox21 Leaderboard](https://huggingface.co/spaces/ml-jku/tox21_leaderboard).
30
+
31
+ ## Model Overview
32
+
33
+ | Property | Value |
34
+ |----------|-------|
35
+ | **Architecture** | 10-fold ensemble of SNNs |
36
+ | **Parameters** | ~19M total |
37
+ | **Hidden Layers** | 8 layers × 768 units |
38
+ | **Activation** | SELU + AlphaDropout |
39
+ | **Training** | 300 epochs, 40-fold CV |
40
+ | **CV AUC** | 0.882 ± 0.021 |
41
+
42
+ ## Molecular Features (11,369 total)
43
+
44
+ | Feature Type | Dimensions | Description |
45
+ |--------------|------------|-------------|
46
+ | **ECFP6** | 8,192 | Extended-connectivity fingerprints (radius 3) |
47
+ | **MACCS Keys** | 167 | Structural keys for substructure screening |
48
+ | **RDKit Descriptors** | 208 | Physicochemical properties (LogP, TPSA, MW, etc.) |
49
+ | **Toxicophores** | 1,868 | SMARTS-based toxicity structural alerts |
50
+ | **Target Similarity** | 934 | Tanimoto similarity to known receptor ligands |
51
+
52
+ ## Training Details
53
+
54
+ - **Loss Function**: Focal Loss (γ=2.5, α=0.25) for class imbalance
55
+ - **Regularization**: Label smoothing (0.1), Mixup augmentation (α=0.2)
56
+ - **Feature Selection**: Variance-based selection per fold (ECFP, toxicophores)
57
+ - **Normalization**: SquashScaler (StandardScaler → tanh → StandardScaler)
58
+ - **Ensemble Selection**: Top-10 folds from 40-fold stratified CV
59
+
60
+ ## Tox21 Endpoints
61
+
62
+ ### Nuclear Receptor Panel
63
+ | Endpoint | Target | Biological Significance |
64
+ |----------|--------|------------------------|
65
+ | **NR-AR** | Androgen Receptor | Male reproductive toxicity |
66
+ | **NR-AR-LBD** | AR Ligand Binding Domain | Direct AR modulation |
67
+ | **NR-AhR** | Aryl Hydrocarbon Receptor | Dioxin-like toxicity, carcinogenesis |
68
+ | **NR-Aromatase** | CYP19A1 Enzyme | Estrogen synthesis disruption |
69
+ | **NR-ER** | Estrogen Receptor | Endocrine disruption |
70
+ | **NR-ER-LBD** | ER Ligand Binding Domain | Direct ER modulation |
71
+ | **NR-PPAR-gamma** | PPARγ | Metabolic disruption |
72
+
73
+ ### Stress Response Panel
74
+ | Endpoint | Target | Biological Significance |
75
+ |----------|--------|------------------------|
76
+ | **SR-ARE** | Antioxidant Response Element | Oxidative stress |
77
+ | **SR-ATAD5** | ATAD5 | DNA damage response |
78
+ | **SR-HSE** | Heat Shock Element | Protein folding stress |
79
+ | **SR-MMP** | Mitochondrial Membrane Potential | Mitochondrial toxicity |
80
+ | **SR-p53** | Tumor Protein p53 | Genotoxicity |
81
+
82
+ ## API Endpoints
83
+
84
+ | Endpoint | Method | Description |
85
+ |----------|--------|-------------|
86
+ | `/metadata` | GET | Model configuration and capabilities |
87
+ | `/predict` | POST | Toxicity predictions for SMILES |
88
+ | `/health` | GET | Health check |
89
+
90
+ ## Usage
91
+
92
+ ### Python
93
+ ```python
94
+ import requests
95
+
96
+ response = requests.post(
97
+ "https://aarshit-mittal-rasayan-tox21.hf.space/predict",
98
+ json={"smiles": ["CC(=O)Nc1ccc(O)cc1", "c1ccccc1"]}
99
+ )
100
+
101
+ predictions = response.json()["predictions"]
102
+ for smiles, scores in predictions.items():
103
+ print(f"{smiles}:")
104
+ for target, prob in sorted(scores.items(), key=lambda x: -x[1])[:3]:
105
+ print(f" {target}: {prob:.1%}")
106
+ ```
107
+
108
+ ### cURL
109
+ ```bash
110
+ curl -X POST "https://aarshit-mittal-rasayan-tox21.hf.space/predict" \
111
+ -H "Content-Type: application/json" \
112
+ -d '{"smiles": ["CCO", "c1ccccc1"]}'
113
+ ```
114
+
115
+ ## Response Format
116
+
117
+ ```json
118
+ {
119
+ "predictions": {
120
+ "CCO": {
121
+ "NR-AR": 0.041,
122
+ "NR-AR-LBD": 0.040,
123
+ "NR-AhR": 0.049,
124
+ "NR-Aromatase": 0.078,
125
+ "NR-ER": 0.133,
126
+ "NR-ER-LBD": 0.076,
127
+ "NR-PPAR-gamma": 0.058,
128
+ "SR-ARE": 0.100,
129
+ "SR-ATAD5": 0.038,
130
+ "SR-HSE": 0.066,
131
+ "SR-MMP": 0.082,
132
+ "SR-p53": 0.052
133
+ }
134
+ },
135
+ "model_info": {
136
+ "name": "Rasayan Tox21 SNN Ensemble",
137
+ "version": "1.0.0"
138
+ }
139
+ }
140
+ ```
141
+
142
+ ## Interpretation Guide
143
+
144
+ | Probability | Risk Level | Recommendation |
145
+ |-------------|------------|----------------|
146
+ | < 0.2 | Minimal | Unlikely to be active |
147
+ | 0.2 - 0.4 | Low | Monitor for chronic exposure |
148
+ | 0.4 - 0.7 | Moderate | Further investigation warranted |
149
+ | ≥ 0.7 | High | Strong toxicity signal |
150
+
151
+ ## References
152
+
153
+ - **Tox21 Challenge**: [NIH Tox21 Data Challenge](https://tripod.nih.gov/tox21/challenge/)
154
+ - **SNN Architecture**: [Klambauer et al., 2017](https://arxiv.org/abs/1706.02515)
155
+ - **Leaderboard**: [ml-jku Tox21 Leaderboard](https://huggingface.co/spaces/ml-jku/tox21_leaderboard)
156
+
157
+ ## License
158
+
159
+ Apache 2.0
160
+
161
+ ---
162
+
163
+ <p align="center">
164
+ Built by <a href="https://rasayan.ai">Rasayan Labs</a>
165
+ </p>
app.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+ from typing import List, Dict, Any
4
+
5
+ ROOT = Path(__file__).parent
6
+ sys.path.insert(0, str(ROOT))
7
+
8
+ from fastapi import FastAPI, HTTPException
9
+ from pydantic import BaseModel, Field
10
+ import numpy as np
11
+
12
+ from src import EnhancedFeatureExtractor, Tox21Ensemble
13
+
14
+ app = FastAPI(
15
+ title="Rasayan Tox21 Classifier",
16
+ description="Self-Normalizing Neural Network ensemble for Tox21 toxicity prediction",
17
+ version="1.0.0"
18
+ )
19
+
20
+ TASKS = [
21
+ "NR-AR", "NR-AR-LBD", "NR-AhR", "NR-Aromatase", "NR-ER", "NR-ER-LBD",
22
+ "NR-PPAR-gamma", "SR-ARE", "SR-ATAD5", "SR-HSE", "SR-MMP", "SR-p53"
23
+ ]
24
+
25
+ FEATURE_KEYS = [
26
+ "ecfps", "maccs", "rdkit_descrs", "tox", "rdkit_filters",
27
+ "similarity", "max_similarity", "db_similarity"
28
+ ]
29
+
30
+ MAX_BATCH_SIZE = 256
31
+
32
+ print("Loading model...")
33
+ extractor = EnhancedFeatureExtractor(
34
+ toxicophores_path=ROOT / "data" / "toxicophores_validated.json",
35
+ db_ligands_path=ROOT / "data" / "target_ligands_validated.json",
36
+ )
37
+ ensemble = Tox21Ensemble(ROOT / "checkpoints" / "ensemble.pt")
38
+ print("Model loaded successfully!")
39
+
40
+
41
+ class PredictRequest(BaseModel):
42
+ smiles: List[str] = Field(..., min_length=1, max_length=1000)
43
+
44
+
45
+ class PredictResponse(BaseModel):
46
+ predictions: Dict[str, Dict[str, float]]
47
+ model_info: Dict[str, Any]
48
+
49
+
50
+ class MetadataResponse(BaseModel):
51
+ model_name: str
52
+ version: str
53
+ max_batch_size: int
54
+ tox_endpoints: List[str]
55
+ description: str
56
+
57
+
58
+ @app.get("/metadata", response_model=MetadataResponse)
59
+ def get_metadata():
60
+ return {
61
+ "model_name": "Rasayan Tox21 SNN Ensemble",
62
+ "version": "1.0.0",
63
+ "max_batch_size": MAX_BATCH_SIZE,
64
+ "tox_endpoints": TASKS,
65
+ "description": "10-fold ensemble of Self-Normalizing Neural Networks trained on Tox21 Challenge data. Features: ECFP6, MACCS, RDKit descriptors, toxicophores, and target similarity."
66
+ }
67
+
68
+
69
+ @app.post("/predict", response_model=PredictResponse)
70
+ def predict(request: PredictRequest):
71
+ smiles_list = request.smiles
72
+
73
+ if len(smiles_list) > 1000:
74
+ raise HTTPException(status_code=400, detail="Maximum 1000 SMILES per request")
75
+
76
+ if len(smiles_list) == 0:
77
+ raise HTTPException(status_code=400, detail="At least 1 SMILES required")
78
+
79
+ try:
80
+ features_dict, valid = extractor.extract_features(smiles_list)
81
+
82
+ features = np.concatenate(
83
+ [features_dict[k] for k in FEATURE_KEYS if k in features_dict],
84
+ axis=1
85
+ )
86
+ features = np.nan_to_num(features, nan=0.0, posinf=0.0, neginf=0.0)
87
+
88
+ probs = ensemble.predict(features)
89
+
90
+ predictions = {}
91
+ for i, smi in enumerate(smiles_list):
92
+ if valid[i]:
93
+ predictions[smi] = {
94
+ task: float(probs[i, j]) for j, task in enumerate(TASKS)
95
+ }
96
+ else:
97
+ predictions[smi] = {task: 0.5 for task in TASKS}
98
+
99
+ return {
100
+ "predictions": predictions,
101
+ "model_info": {
102
+ "name": "Rasayan Tox21 SNN Ensemble",
103
+ "version": "1.0.0"
104
+ }
105
+ }
106
+
107
+ except Exception as e:
108
+ raise HTTPException(status_code=500, detail=str(e))
109
+
110
+
111
+ @app.get("/health")
112
+ def health():
113
+ return {"status": "ok"}
114
+
115
+
116
+ if __name__ == "__main__":
117
+ import uvicorn
118
+ uvicorn.run(app, host="0.0.0.0", port=7860)
checkpoints/ensemble.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9fb42a747fea42436c174c211983782987b706f44b75d6f7cfd02e3f5ebfa4a
3
+ size 191696311
data/target_ligands_validated.json ADDED
@@ -0,0 +1,1228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "NR-AR": [
3
+ {
4
+ "name": "TESTOSTERONE ENANTHATE",
5
+ "smiles": "CCCCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C",
6
+ "chembl_id": "CHEMBL1200335",
7
+ "targets": "Androgen Receptor"
8
+ },
9
+ {
10
+ "name": "NANDROLONE PHENPROPIONATE",
11
+ "smiles": "C[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@@H]43)[C@@H]1CC[C@@H]2OC(=O)CCc1ccccc1",
12
+ "chembl_id": "CHEMBL1200412",
13
+ "targets": "Androgen Receptor"
14
+ },
15
+ {
16
+ "name": "OXYMETHOLONE",
17
+ "smiles": "C[C@]12C/C(=C/O)C(=O)C[C@@H]1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
18
+ "chembl_id": "CHEMBL1200585",
19
+ "targets": "Androgen Receptor"
20
+ },
21
+ {
22
+ "name": "ETHYLESTRENOL",
23
+ "smiles": "CC[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCC[C@@H]4[C@H]3CC[C@@]21C",
24
+ "chembl_id": "CHEMBL1200623",
25
+ "targets": "Androgen Receptor"
26
+ },
27
+ {
28
+ "name": "NANDROLONE DECANOATE",
29
+ "smiles": "CCCCCCCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@@H]4[C@H]3CC[C@]12C",
30
+ "chembl_id": "CHEMBL1200946",
31
+ "targets": "Androgen Receptor"
32
+ },
33
+ {
34
+ "name": "ENOBOSARM",
35
+ "smiles": "C[C@](O)(COc1ccc(C#N)cc1)C(=O)Nc1ccc(C#N)c(C(F)(F)F)c1",
36
+ "chembl_id": "CHEMBL1738889",
37
+ "targets": "Androgen Receptor"
38
+ },
39
+ {
40
+ "name": "PRUXELUTAMIDE",
41
+ "smiles": "CC1(C)C(=O)N(c2ccc(C#N)c(C(F)(F)F)c2F)C(=S)N1c1ccc(CCCc2ncco2)nc1",
42
+ "chembl_id": "CHEMBL4594417",
43
+ "targets": "Androgen Receptor"
44
+ },
45
+ {
46
+ "name": "TESTOSTERONE PROPIONATE",
47
+ "smiles": "CCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C",
48
+ "chembl_id": "CHEMBL1170",
49
+ "targets": "Androgen Receptor"
50
+ },
51
+ {
52
+ "name": "FLUOXYMESTERONE",
53
+ "smiles": "C[C@]1(O)CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@@]3(F)[C@@H](O)C[C@@]21C",
54
+ "chembl_id": "CHEMBL1445",
55
+ "targets": "Androgen Receptor"
56
+ },
57
+ {
58
+ "name": "ENZALUTAMIDE",
59
+ "smiles": "CNC(=O)c1ccc(N2C(=S)N(c3ccc(C#N)c(C(F)(F)F)c3)C(=O)C2(C)C)cc1F",
60
+ "chembl_id": "CHEMBL1082407",
61
+ "targets": "Androgen Receptor"
62
+ },
63
+ {
64
+ "name": "STANOZOLOL",
65
+ "smiles": "C[C@]12Cc3c[nH]nc3C[C@@H]1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
66
+ "chembl_id": "CHEMBL2079587",
67
+ "targets": "Androgen Receptor"
68
+ },
69
+ {
70
+ "name": "TESTOSTERONE UNDECANOATE",
71
+ "smiles": "CCCCCCCCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C",
72
+ "chembl_id": "CHEMBL2107067",
73
+ "targets": "Androgen Receptor"
74
+ },
75
+ {
76
+ "name": "DAROLUTAMIDE",
77
+ "smiles": "CC(O)c1cc(C(=O)N[C@@H](C)Cn2ccc(-c3ccc(C#N)c(Cl)c3)n2)n[nH]1",
78
+ "chembl_id": "CHEMBL4297185",
79
+ "targets": "Androgen Receptor"
80
+ },
81
+ {
82
+ "name": "BICALUTAMIDE",
83
+ "smiles": "CC(O)(CS(=O)(=O)c1ccc(F)cc1)C(=O)Nc1ccc(C#N)c(C(F)(F)F)c1",
84
+ "chembl_id": "CHEMBL409",
85
+ "targets": "Androgen Receptor"
86
+ },
87
+ {
88
+ "name": "FLUTAMIDE",
89
+ "smiles": "CC(C)C(=O)Nc1ccc([N+](=O)[O-])c(C(F)(F)F)c1",
90
+ "chembl_id": "CHEMBL806",
91
+ "targets": "Androgen Receptor"
92
+ },
93
+ {
94
+ "name": "NILUTAMIDE",
95
+ "smiles": "CC1(C)NC(=O)N(c2ccc([N+](=O)[O-])c(C(F)(F)F)c2)C1=O",
96
+ "chembl_id": "CHEMBL1274",
97
+ "targets": "Androgen Receptor"
98
+ },
99
+ {
100
+ "name": "METHYLTESTOSTERONE",
101
+ "smiles": "C[C@]12CCC(=O)C=C1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
102
+ "chembl_id": "CHEMBL1395",
103
+ "targets": "Androgen Receptor"
104
+ },
105
+ {
106
+ "name": "TESTOSTERONE",
107
+ "smiles": "C[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@@]43C)[C@@H]1CC[C@@H]2O",
108
+ "chembl_id": "CHEMBL386630",
109
+ "targets": "Androgen Receptor"
110
+ },
111
+ {
112
+ "name": "OXANDROLONE",
113
+ "smiles": "C[C@]12COC(=O)C[C@@H]1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
114
+ "chembl_id": "CHEMBL1200436",
115
+ "targets": "Androgen Receptor"
116
+ },
117
+ {
118
+ "name": "DROMOSTANOLONE PROPIONATE",
119
+ "smiles": "CCC(=O)O[C@H]1CC[C@H]2[C@@H]3CC[C@H]4CC(=O)[C@H](C)C[C@]4(C)[C@H]3CC[C@]12C",
120
+ "chembl_id": "CHEMBL1201048",
121
+ "targets": "Androgen Receptor"
122
+ },
123
+ {
124
+ "name": "TESTOSTERONE CYPIONATE",
125
+ "smiles": "C[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@@]43C)[C@@H]1CC[C@@H]2OC(=O)CCC1CCCC1",
126
+ "chembl_id": "CHEMBL1201101",
127
+ "targets": "Androgen Receptor"
128
+ },
129
+ {
130
+ "name": "METHANDROSTENOLONE",
131
+ "smiles": "C[C@]12C=CC(=O)C=C1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
132
+ "chembl_id": "CHEMBL1418176",
133
+ "targets": "Androgen Receptor"
134
+ },
135
+ {
136
+ "name": "APALUTAMIDE",
137
+ "smiles": "CNC(=O)c1ccc(N2C(=S)N(c3cnc(C#N)c(C(F)(F)F)c3)C(=O)C23CCC3)cc1F",
138
+ "chembl_id": "CHEMBL3183409",
139
+ "targets": "Androgen Receptor"
140
+ },
141
+ {
142
+ "name": "CLASCOTERONE",
143
+ "smiles": "CCC(=O)O[C@]1(C(=O)CO)CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@@]21C",
144
+ "chembl_id": "CHEMBL3590187",
145
+ "targets": "Androgen Receptor"
146
+ },
147
+ {
148
+ "name": "SHR3680",
149
+ "smiles": "CC1(C)C(=O)N(c2ccc(C#N)c(C(F)(F)F)c2)C(=S)N1c1ccc(OC[C@@H](O)CO)cc1",
150
+ "chembl_id": "CHEMBL4650276",
151
+ "targets": "Androgen Receptor"
152
+ },
153
+ {
154
+ "name": "DANAZOL",
155
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCC4=Cc5oncc5C[C@]4(C)[C@H]3CC[C@@]21C",
156
+ "chembl_id": "CHEMBL1479",
157
+ "targets": "Androgen Receptor,Progesterone receptor"
158
+ },
159
+ {
160
+ "name": "CYPROTERONE ACETATE",
161
+ "smiles": "CC(=O)O[C@]1(C(C)=O)CC[C@H]2[C@@H]3C=C(Cl)C4=CC(=O)[C@@H]5C[C@@H]5[C@]4(C)[C@H]3CC[C@@]21C",
162
+ "chembl_id": "CHEMBL139835",
163
+ "targets": "Androgen Receptor,Glucocorticoid receptor,Progesterone receptor"
164
+ },
165
+ {
166
+ "name": "GALETERONE",
167
+ "smiles": "C[C@]12CC[C@H](O)CC1=CC[C@@H]1[C@@H]2CC[C@]2(C)C(n3cnc4ccccc43)=CC[C@@H]12",
168
+ "chembl_id": "CHEMBL2105738",
169
+ "targets": "Androgen Receptor,Cytochrome P450 17A1"
170
+ }
171
+ ],
172
+ "NR-AR-LBD": [
173
+ {
174
+ "name": "TESTOSTERONE ENANTHATE",
175
+ "smiles": "CCCCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C",
176
+ "chembl_id": "CHEMBL1200335",
177
+ "targets": "Androgen Receptor"
178
+ },
179
+ {
180
+ "name": "NANDROLONE PHENPROPIONATE",
181
+ "smiles": "C[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@@H]43)[C@@H]1CC[C@@H]2OC(=O)CCc1ccccc1",
182
+ "chembl_id": "CHEMBL1200412",
183
+ "targets": "Androgen Receptor"
184
+ },
185
+ {
186
+ "name": "OXYMETHOLONE",
187
+ "smiles": "C[C@]12C/C(=C/O)C(=O)C[C@@H]1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
188
+ "chembl_id": "CHEMBL1200585",
189
+ "targets": "Androgen Receptor"
190
+ },
191
+ {
192
+ "name": "ETHYLESTRENOL",
193
+ "smiles": "CC[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCC[C@@H]4[C@H]3CC[C@@]21C",
194
+ "chembl_id": "CHEMBL1200623",
195
+ "targets": "Androgen Receptor"
196
+ },
197
+ {
198
+ "name": "NANDROLONE DECANOATE",
199
+ "smiles": "CCCCCCCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@@H]4[C@H]3CC[C@]12C",
200
+ "chembl_id": "CHEMBL1200946",
201
+ "targets": "Androgen Receptor"
202
+ },
203
+ {
204
+ "name": "ENOBOSARM",
205
+ "smiles": "C[C@](O)(COc1ccc(C#N)cc1)C(=O)Nc1ccc(C#N)c(C(F)(F)F)c1",
206
+ "chembl_id": "CHEMBL1738889",
207
+ "targets": "Androgen Receptor"
208
+ },
209
+ {
210
+ "name": "PRUXELUTAMIDE",
211
+ "smiles": "CC1(C)C(=O)N(c2ccc(C#N)c(C(F)(F)F)c2F)C(=S)N1c1ccc(CCCc2ncco2)nc1",
212
+ "chembl_id": "CHEMBL4594417",
213
+ "targets": "Androgen Receptor"
214
+ },
215
+ {
216
+ "name": "TESTOSTERONE PROPIONATE",
217
+ "smiles": "CCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C",
218
+ "chembl_id": "CHEMBL1170",
219
+ "targets": "Androgen Receptor"
220
+ },
221
+ {
222
+ "name": "FLUOXYMESTERONE",
223
+ "smiles": "C[C@]1(O)CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@@]3(F)[C@@H](O)C[C@@]21C",
224
+ "chembl_id": "CHEMBL1445",
225
+ "targets": "Androgen Receptor"
226
+ },
227
+ {
228
+ "name": "ENZALUTAMIDE",
229
+ "smiles": "CNC(=O)c1ccc(N2C(=S)N(c3ccc(C#N)c(C(F)(F)F)c3)C(=O)C2(C)C)cc1F",
230
+ "chembl_id": "CHEMBL1082407",
231
+ "targets": "Androgen Receptor"
232
+ },
233
+ {
234
+ "name": "STANOZOLOL",
235
+ "smiles": "C[C@]12Cc3c[nH]nc3C[C@@H]1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
236
+ "chembl_id": "CHEMBL2079587",
237
+ "targets": "Androgen Receptor"
238
+ },
239
+ {
240
+ "name": "TESTOSTERONE UNDECANOATE",
241
+ "smiles": "CCCCCCCCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@]12C",
242
+ "chembl_id": "CHEMBL2107067",
243
+ "targets": "Androgen Receptor"
244
+ },
245
+ {
246
+ "name": "DAROLUTAMIDE",
247
+ "smiles": "CC(O)c1cc(C(=O)N[C@@H](C)Cn2ccc(-c3ccc(C#N)c(Cl)c3)n2)n[nH]1",
248
+ "chembl_id": "CHEMBL4297185",
249
+ "targets": "Androgen Receptor"
250
+ },
251
+ {
252
+ "name": "BICALUTAMIDE",
253
+ "smiles": "CC(O)(CS(=O)(=O)c1ccc(F)cc1)C(=O)Nc1ccc(C#N)c(C(F)(F)F)c1",
254
+ "chembl_id": "CHEMBL409",
255
+ "targets": "Androgen Receptor"
256
+ },
257
+ {
258
+ "name": "FLUTAMIDE",
259
+ "smiles": "CC(C)C(=O)Nc1ccc([N+](=O)[O-])c(C(F)(F)F)c1",
260
+ "chembl_id": "CHEMBL806",
261
+ "targets": "Androgen Receptor"
262
+ },
263
+ {
264
+ "name": "NILUTAMIDE",
265
+ "smiles": "CC1(C)NC(=O)N(c2ccc([N+](=O)[O-])c(C(F)(F)F)c2)C1=O",
266
+ "chembl_id": "CHEMBL1274",
267
+ "targets": "Androgen Receptor"
268
+ },
269
+ {
270
+ "name": "METHYLTESTOSTERONE",
271
+ "smiles": "C[C@]12CCC(=O)C=C1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
272
+ "chembl_id": "CHEMBL1395",
273
+ "targets": "Androgen Receptor"
274
+ },
275
+ {
276
+ "name": "TESTOSTERONE",
277
+ "smiles": "C[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@@]43C)[C@@H]1CC[C@@H]2O",
278
+ "chembl_id": "CHEMBL386630",
279
+ "targets": "Androgen Receptor"
280
+ },
281
+ {
282
+ "name": "OXANDROLONE",
283
+ "smiles": "C[C@]12COC(=O)C[C@@H]1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
284
+ "chembl_id": "CHEMBL1200436",
285
+ "targets": "Androgen Receptor"
286
+ },
287
+ {
288
+ "name": "DROMOSTANOLONE PROPIONATE",
289
+ "smiles": "CCC(=O)O[C@H]1CC[C@H]2[C@@H]3CC[C@H]4CC(=O)[C@H](C)C[C@]4(C)[C@H]3CC[C@]12C",
290
+ "chembl_id": "CHEMBL1201048",
291
+ "targets": "Androgen Receptor"
292
+ },
293
+ {
294
+ "name": "TESTOSTERONE CYPIONATE",
295
+ "smiles": "C[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@@]43C)[C@@H]1CC[C@@H]2OC(=O)CCC1CCCC1",
296
+ "chembl_id": "CHEMBL1201101",
297
+ "targets": "Androgen Receptor"
298
+ },
299
+ {
300
+ "name": "METHANDROSTENOLONE",
301
+ "smiles": "C[C@]12C=CC(=O)C=C1CC[C@@H]1[C@@H]2CC[C@@]2(C)[C@H]1CC[C@]2(C)O",
302
+ "chembl_id": "CHEMBL1418176",
303
+ "targets": "Androgen Receptor"
304
+ },
305
+ {
306
+ "name": "APALUTAMIDE",
307
+ "smiles": "CNC(=O)c1ccc(N2C(=S)N(c3cnc(C#N)c(C(F)(F)F)c3)C(=O)C23CCC3)cc1F",
308
+ "chembl_id": "CHEMBL3183409",
309
+ "targets": "Androgen Receptor"
310
+ },
311
+ {
312
+ "name": "CLASCOTERONE",
313
+ "smiles": "CCC(=O)O[C@]1(C(=O)CO)CC[C@H]2[C@@H]3CCC4=CC(=O)CC[C@]4(C)[C@H]3CC[C@@]21C",
314
+ "chembl_id": "CHEMBL3590187",
315
+ "targets": "Androgen Receptor"
316
+ },
317
+ {
318
+ "name": "SHR3680",
319
+ "smiles": "CC1(C)C(=O)N(c2ccc(C#N)c(C(F)(F)F)c2)C(=S)N1c1ccc(OC[C@@H](O)CO)cc1",
320
+ "chembl_id": "CHEMBL4650276",
321
+ "targets": "Androgen Receptor"
322
+ },
323
+ {
324
+ "name": "DANAZOL",
325
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCC4=Cc5oncc5C[C@]4(C)[C@H]3CC[C@@]21C",
326
+ "chembl_id": "CHEMBL1479",
327
+ "targets": "Androgen Receptor,Progesterone receptor"
328
+ },
329
+ {
330
+ "name": "CYPROTERONE ACETATE",
331
+ "smiles": "CC(=O)O[C@]1(C(C)=O)CC[C@H]2[C@@H]3C=C(Cl)C4=CC(=O)[C@@H]5C[C@@H]5[C@]4(C)[C@H]3CC[C@@]21C",
332
+ "chembl_id": "CHEMBL139835",
333
+ "targets": "Androgen Receptor,Glucocorticoid receptor,Progesterone receptor"
334
+ },
335
+ {
336
+ "name": "GALETERONE",
337
+ "smiles": "C[C@]12CC[C@H](O)CC1=CC[C@@H]1[C@@H]2CC[C@]2(C)C(n3cnc4ccccc43)=CC[C@@H]12",
338
+ "chembl_id": "CHEMBL2105738",
339
+ "targets": "Androgen Receptor,Cytochrome P450 17A1"
340
+ }
341
+ ],
342
+ "NR-ER": [
343
+ {
344
+ "name": "ACOLBIFENE",
345
+ "smiles": "CC1=C(c2ccc(O)cc2)[C@H](c2ccc(OCCN3CCCCC3)cc2)Oc2cc(O)ccc21",
346
+ "chembl_id": "CHEMBL68055",
347
+ "targets": "Estrogen receptor"
348
+ },
349
+ {
350
+ "name": "ARZOXIFENE",
351
+ "smiles": "COc1ccc(-c2sc3cc(O)ccc3c2Oc2ccc(OCCN3CCCCC3)cc2)cc1",
352
+ "chembl_id": "CHEMBL226267",
353
+ "targets": "Estrogen receptor"
354
+ },
355
+ {
356
+ "name": "TOREMIFENE CITRATE",
357
+ "smiles": "CN(C)CCOc1ccc(/C(=C(/CCCl)c2ccccc2)c2ccccc2)cc1.O=C(O)CC(O)(CC(=O)O)C(=O)O",
358
+ "chembl_id": "CHEMBL1200675",
359
+ "targets": "Estrogen receptor"
360
+ },
361
+ {
362
+ "name": "ESTETROL",
363
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1[C@@H](O)[C@@H](O)[C@@H]2O",
364
+ "chembl_id": "CHEMBL1230314",
365
+ "targets": "Estrogen receptor"
366
+ },
367
+ {
368
+ "name": "DROLOXIFENE",
369
+ "smiles": "CC/C(=C(/c1ccc(OCCN(C)C)cc1)c1cccc(O)c1)c1ccccc1",
370
+ "chembl_id": "CHEMBL487",
371
+ "targets": "Estrogen receptor"
372
+ },
373
+ {
374
+ "name": "LASOFOXIFENE",
375
+ "smiles": "Oc1ccc2c(c1)CC[C@H](c1ccccc1)[C@@H]2c1ccc(OCCN2CCCC2)cc1",
376
+ "chembl_id": "CHEMBL328190",
377
+ "targets": "Estrogen receptor"
378
+ },
379
+ {
380
+ "name": "CYCLOFENIL",
381
+ "smiles": "CC(=O)Oc1ccc(C(=C2CCCCC2)c2ccc(OC(C)=O)cc2)cc1",
382
+ "chembl_id": "CHEMBL141305",
383
+ "targets": "Estrogen receptor"
384
+ },
385
+ {
386
+ "name": "FULVESTRANT",
387
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4C[C@@H](CCCCCCCCC[S+]([O-])CCCC(F)(F)C(F)(F)F)[C@H]3[C@@H]1CC[C@@H]2O",
388
+ "chembl_id": "CHEMBL1358",
389
+ "targets": "Estrogen receptor"
390
+ },
391
+ {
392
+ "name": "ESTRIOL",
393
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1C[C@@H](O)[C@@H]2O",
394
+ "chembl_id": "CHEMBL193482",
395
+ "targets": "Estrogen receptor"
396
+ },
397
+ {
398
+ "name": "DIETHYLSTILBESTROL DIPHOSPHATE",
399
+ "smiles": "CC/C(=C(/CC)c1ccc(OP(=O)(O)O)cc1)c1ccc(OP(=O)(O)O)cc1",
400
+ "chembl_id": "CHEMBL1200598",
401
+ "targets": "Estrogen receptor"
402
+ },
403
+ {
404
+ "name": "OSPEMIFENE",
405
+ "smiles": "OCCOc1ccc(/C(=C(/CCCl)c2ccccc2)c2ccccc2)cc1",
406
+ "chembl_id": "CHEMBL2105395",
407
+ "targets": "Estrogen receptor"
408
+ },
409
+ {
410
+ "name": "AFIMOXIFENE",
411
+ "smiles": "CC/C(=C(\\c1ccc(O)cc1)c1ccc(OCCN(C)C)cc1)c1ccccc1",
412
+ "chembl_id": "CHEMBL489",
413
+ "targets": "Estrogen receptor,Estrogen-related receptor gamma"
414
+ },
415
+ {
416
+ "name": "QUINESTROL",
417
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCc4cc(OC5CCCC5)ccc4[C@H]3CC[C@@]21C",
418
+ "chembl_id": "CHEMBL1201165",
419
+ "targets": "Estrogen receptor"
420
+ },
421
+ {
422
+ "name": "BAZEDOXIFENE ACETATE",
423
+ "smiles": "CC(=O)O.Cc1c(-c2ccc(O)cc2)n(Cc2ccc(OCCN3CCCCCC3)cc2)c2ccc(O)cc12",
424
+ "chembl_id": "CHEMBL2106615",
425
+ "targets": "Estrogen receptor"
426
+ },
427
+ {
428
+ "name": "DIETHYLSTILBESTROL",
429
+ "smiles": "CC/C(=C(/CC)c1ccc(O)cc1)c1ccc(O)cc1",
430
+ "chembl_id": "CHEMBL411",
431
+ "targets": "Estrogen receptor alpha"
432
+ },
433
+ {
434
+ "name": "ETHINYL ESTRADIOL",
435
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCc4cc(O)ccc4[C@H]3CC[C@@]21C",
436
+ "chembl_id": "CHEMBL691",
437
+ "targets": "Estrogen receptor alpha"
438
+ },
439
+ {
440
+ "name": "TAMOXIFEN CITRATE",
441
+ "smiles": "CC/C(=C(\\c1ccccc1)c1ccc(OCCN(C)C)cc1)c1ccccc1.O=C(O)CC(O)(CC(=O)O)C(=O)O",
442
+ "chembl_id": "CHEMBL786",
443
+ "targets": "Estrogen receptor alpha"
444
+ },
445
+ {
446
+ "name": "DIENESTROL",
447
+ "smiles": "C/C=C(C(=C/C)/c1ccc(O)cc1)\\c1ccc(O)cc1",
448
+ "chembl_id": "CHEMBL1018",
449
+ "targets": "Estrogen receptor alpha"
450
+ },
451
+ {
452
+ "name": "AMCENESTRANT",
453
+ "smiles": "O=C(O)c1ccc2c(c1)CCCC(c1ccc(Cl)cc1Cl)=C2c1ccc(O[C@H]2CCN(CCCF)C2)cc1",
454
+ "chembl_id": "CHEMBL4475463",
455
+ "targets": "Estrogen receptor alpha"
456
+ },
457
+ {
458
+ "name": "ESTRADIOL",
459
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1CC[C@@H]2O",
460
+ "chembl_id": "CHEMBL135",
461
+ "targets": "Estrogen receptor alpha"
462
+ },
463
+ {
464
+ "name": "ENCLOMIPHENE",
465
+ "smiles": "CCN(CC)CCOc1ccc(/C(=C(/Cl)c2ccccc2)c2ccccc2)cc1",
466
+ "chembl_id": "CHEMBL954",
467
+ "targets": "Estrogen receptor alpha"
468
+ },
469
+ {
470
+ "name": "ELACESTRANT HYDROCHLORIDE",
471
+ "smiles": "CCNCCc1ccc(CN(CC)c2cc(OC)ccc2[C@@H]2CCc3cc(O)ccc3C2)cc1.Cl.Cl",
472
+ "chembl_id": "CHEMBL4594273",
473
+ "targets": "Estrogen receptor alpha"
474
+ },
475
+ {
476
+ "name": "ALLYLESTRENOL",
477
+ "smiles": "C=CC[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCCC4[C@H]3CC[C@@]21C",
478
+ "chembl_id": "CHEMBL2105618",
479
+ "targets": "Estrogen receptor,Progesterone receptor"
480
+ },
481
+ {
482
+ "name": "RALOXIFENE HYDROCHLORIDE",
483
+ "smiles": "Cl.O=C(c1ccc(OCCN2CCCCC2)cc1)c1c(-c2ccc(O)cc2)sc2cc(O)ccc12",
484
+ "chembl_id": "CHEMBL1116",
485
+ "targets": "Estrogen receptor beta"
486
+ },
487
+ {
488
+ "name": "ESTRONE",
489
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1CCC2=O",
490
+ "chembl_id": "CHEMBL1405",
491
+ "targets": "Estrogen receptor alpha"
492
+ },
493
+ {
494
+ "name": "ESTRADIOL VALERATE",
495
+ "smiles": "CCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCc4cc(O)ccc4[C@H]3CC[C@]12C",
496
+ "chembl_id": "CHEMBL1511",
497
+ "targets": "Estrogen receptor alpha"
498
+ },
499
+ {
500
+ "name": "ESTRADIOL ACETATE",
501
+ "smiles": "CC(=O)Oc1ccc2c(c1)CC[C@@H]1[C@@H]2CC[C@]2(C)[C@@H](O)CC[C@@H]12",
502
+ "chembl_id": "CHEMBL1200430",
503
+ "targets": "Estrogen receptor alpha"
504
+ },
505
+ {
506
+ "name": "CHLOROTRIANISENE",
507
+ "smiles": "COc1ccc(C(Cl)=C(c2ccc(OC)cc2)c2ccc(OC)cc2)cc1",
508
+ "chembl_id": "CHEMBL1200761",
509
+ "targets": "Estrogen receptor beta"
510
+ },
511
+ {
512
+ "name": "ESTRADIOL CYPIONATE",
513
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1CC[C@@H]2OC(=O)CCC1CCCC1",
514
+ "chembl_id": "CHEMBL1200973",
515
+ "targets": "Estrogen receptor alpha"
516
+ },
517
+ {
518
+ "name": "ESTROPIPATE",
519
+ "smiles": "C1CNCCN1.C[C@]12CC[C@@H]3c4ccc(OS(=O)(=O)O)cc4CC[C@H]3[C@@H]1CCC2=O",
520
+ "chembl_id": "CHEMBL1200980",
521
+ "targets": "Estrogen receptor alpha"
522
+ },
523
+ {
524
+ "name": "MESTRANOL",
525
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCc4cc(OC)ccc4[C@H]3CC[C@@]21C",
526
+ "chembl_id": "CHEMBL1201151",
527
+ "targets": "Estrogen receptor alpha"
528
+ },
529
+ {
530
+ "name": "CLOMIPHENE CITRATE",
531
+ "smiles": "CCN(CC)CCOc1ccc(C(=C(Cl)c2ccccc2)c2ccccc2)cc1.O=C(O)CC(O)(CC(=O)O)C(=O)O",
532
+ "chembl_id": "CHEMBL3185958",
533
+ "targets": "Estrogen receptor alpha"
534
+ },
535
+ {
536
+ "name": "ELACESTRANT",
537
+ "smiles": "CCNCCc1ccc(CN(CC)c2cc(OC)ccc2[C@@H]2CCc3cc(O)ccc3C2)cc1",
538
+ "chembl_id": "CHEMBL4297509",
539
+ "targets": "Estrogen receptor alpha"
540
+ },
541
+ {
542
+ "name": "GIREDESTRANT",
543
+ "smiles": "C[C@@H]1Cc2c([nH]c3ccccc23)[C@@H](c2c(F)cc(NC3CN(CCCF)C3)cc2F)N1CC(F)(F)CO",
544
+ "chembl_id": "CHEMBL4650316",
545
+ "targets": "Estrogen receptor alpha"
546
+ },
547
+ {
548
+ "name": "CAMIZESTRANT",
549
+ "smiles": "C[C@@H]1Cc2c(ccc3[nH]ncc23)[C@@H](c2ccc(NC3CN(CCCF)C3)cn2)N1CC(F)(F)F",
550
+ "chembl_id": "CHEMBL4650365",
551
+ "targets": "Estrogen receptor alpha"
552
+ },
553
+ {
554
+ "name": "ESTRAMUSTINE PHOSPHATE SODIUM",
555
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(OC(=O)N(CCCl)CCCl)cc4CC[C@H]3[C@@H]1CC[C@@H]2OP(=O)([O-])[O-].[Na+].[Na+]",
556
+ "chembl_id": "CHEMBL1200721",
557
+ "targets": "DNA,Estrogen receptor beta"
558
+ }
559
+ ],
560
+ "NR-ER-LBD": [
561
+ {
562
+ "name": "ACOLBIFENE",
563
+ "smiles": "CC1=C(c2ccc(O)cc2)[C@H](c2ccc(OCCN3CCCCC3)cc2)Oc2cc(O)ccc21",
564
+ "chembl_id": "CHEMBL68055",
565
+ "targets": "Estrogen receptor"
566
+ },
567
+ {
568
+ "name": "ARZOXIFENE",
569
+ "smiles": "COc1ccc(-c2sc3cc(O)ccc3c2Oc2ccc(OCCN3CCCCC3)cc2)cc1",
570
+ "chembl_id": "CHEMBL226267",
571
+ "targets": "Estrogen receptor"
572
+ },
573
+ {
574
+ "name": "TOREMIFENE CITRATE",
575
+ "smiles": "CN(C)CCOc1ccc(/C(=C(/CCCl)c2ccccc2)c2ccccc2)cc1.O=C(O)CC(O)(CC(=O)O)C(=O)O",
576
+ "chembl_id": "CHEMBL1200675",
577
+ "targets": "Estrogen receptor"
578
+ },
579
+ {
580
+ "name": "ESTETROL",
581
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1[C@@H](O)[C@@H](O)[C@@H]2O",
582
+ "chembl_id": "CHEMBL1230314",
583
+ "targets": "Estrogen receptor"
584
+ },
585
+ {
586
+ "name": "DROLOXIFENE",
587
+ "smiles": "CC/C(=C(/c1ccc(OCCN(C)C)cc1)c1cccc(O)c1)c1ccccc1",
588
+ "chembl_id": "CHEMBL487",
589
+ "targets": "Estrogen receptor"
590
+ },
591
+ {
592
+ "name": "LASOFOXIFENE",
593
+ "smiles": "Oc1ccc2c(c1)CC[C@H](c1ccccc1)[C@@H]2c1ccc(OCCN2CCCC2)cc1",
594
+ "chembl_id": "CHEMBL328190",
595
+ "targets": "Estrogen receptor"
596
+ },
597
+ {
598
+ "name": "CYCLOFENIL",
599
+ "smiles": "CC(=O)Oc1ccc(C(=C2CCCCC2)c2ccc(OC(C)=O)cc2)cc1",
600
+ "chembl_id": "CHEMBL141305",
601
+ "targets": "Estrogen receptor"
602
+ },
603
+ {
604
+ "name": "FULVESTRANT",
605
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4C[C@@H](CCCCCCCCC[S+]([O-])CCCC(F)(F)C(F)(F)F)[C@H]3[C@@H]1CC[C@@H]2O",
606
+ "chembl_id": "CHEMBL1358",
607
+ "targets": "Estrogen receptor"
608
+ },
609
+ {
610
+ "name": "ESTRIOL",
611
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1C[C@@H](O)[C@@H]2O",
612
+ "chembl_id": "CHEMBL193482",
613
+ "targets": "Estrogen receptor"
614
+ },
615
+ {
616
+ "name": "DIETHYLSTILBESTROL DIPHOSPHATE",
617
+ "smiles": "CC/C(=C(/CC)c1ccc(OP(=O)(O)O)cc1)c1ccc(OP(=O)(O)O)cc1",
618
+ "chembl_id": "CHEMBL1200598",
619
+ "targets": "Estrogen receptor"
620
+ },
621
+ {
622
+ "name": "OSPEMIFENE",
623
+ "smiles": "OCCOc1ccc(/C(=C(/CCCl)c2ccccc2)c2ccccc2)cc1",
624
+ "chembl_id": "CHEMBL2105395",
625
+ "targets": "Estrogen receptor"
626
+ },
627
+ {
628
+ "name": "AFIMOXIFENE",
629
+ "smiles": "CC/C(=C(\\c1ccc(O)cc1)c1ccc(OCCN(C)C)cc1)c1ccccc1",
630
+ "chembl_id": "CHEMBL489",
631
+ "targets": "Estrogen receptor,Estrogen-related receptor gamma"
632
+ },
633
+ {
634
+ "name": "QUINESTROL",
635
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCc4cc(OC5CCCC5)ccc4[C@H]3CC[C@@]21C",
636
+ "chembl_id": "CHEMBL1201165",
637
+ "targets": "Estrogen receptor"
638
+ },
639
+ {
640
+ "name": "BAZEDOXIFENE ACETATE",
641
+ "smiles": "CC(=O)O.Cc1c(-c2ccc(O)cc2)n(Cc2ccc(OCCN3CCCCCC3)cc2)c2ccc(O)cc12",
642
+ "chembl_id": "CHEMBL2106615",
643
+ "targets": "Estrogen receptor"
644
+ },
645
+ {
646
+ "name": "DIETHYLSTILBESTROL",
647
+ "smiles": "CC/C(=C(/CC)c1ccc(O)cc1)c1ccc(O)cc1",
648
+ "chembl_id": "CHEMBL411",
649
+ "targets": "Estrogen receptor alpha"
650
+ },
651
+ {
652
+ "name": "ETHINYL ESTRADIOL",
653
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCc4cc(O)ccc4[C@H]3CC[C@@]21C",
654
+ "chembl_id": "CHEMBL691",
655
+ "targets": "Estrogen receptor alpha"
656
+ },
657
+ {
658
+ "name": "TAMOXIFEN CITRATE",
659
+ "smiles": "CC/C(=C(\\c1ccccc1)c1ccc(OCCN(C)C)cc1)c1ccccc1.O=C(O)CC(O)(CC(=O)O)C(=O)O",
660
+ "chembl_id": "CHEMBL786",
661
+ "targets": "Estrogen receptor alpha"
662
+ },
663
+ {
664
+ "name": "DIENESTROL",
665
+ "smiles": "C/C=C(C(=C/C)/c1ccc(O)cc1)\\c1ccc(O)cc1",
666
+ "chembl_id": "CHEMBL1018",
667
+ "targets": "Estrogen receptor alpha"
668
+ },
669
+ {
670
+ "name": "AMCENESTRANT",
671
+ "smiles": "O=C(O)c1ccc2c(c1)CCCC(c1ccc(Cl)cc1Cl)=C2c1ccc(O[C@H]2CCN(CCCF)C2)cc1",
672
+ "chembl_id": "CHEMBL4475463",
673
+ "targets": "Estrogen receptor alpha"
674
+ },
675
+ {
676
+ "name": "ESTRADIOL",
677
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1CC[C@@H]2O",
678
+ "chembl_id": "CHEMBL135",
679
+ "targets": "Estrogen receptor alpha"
680
+ },
681
+ {
682
+ "name": "ENCLOMIPHENE",
683
+ "smiles": "CCN(CC)CCOc1ccc(/C(=C(/Cl)c2ccccc2)c2ccccc2)cc1",
684
+ "chembl_id": "CHEMBL954",
685
+ "targets": "Estrogen receptor alpha"
686
+ },
687
+ {
688
+ "name": "ELACESTRANT HYDROCHLORIDE",
689
+ "smiles": "CCNCCc1ccc(CN(CC)c2cc(OC)ccc2[C@@H]2CCc3cc(O)ccc3C2)cc1.Cl.Cl",
690
+ "chembl_id": "CHEMBL4594273",
691
+ "targets": "Estrogen receptor alpha"
692
+ },
693
+ {
694
+ "name": "ALLYLESTRENOL",
695
+ "smiles": "C=CC[C@]1(O)CC[C@H]2[C@@H]3CCC4=CCCCC4[C@H]3CC[C@@]21C",
696
+ "chembl_id": "CHEMBL2105618",
697
+ "targets": "Estrogen receptor,Progesterone receptor"
698
+ },
699
+ {
700
+ "name": "RALOXIFENE HYDROCHLORIDE",
701
+ "smiles": "Cl.O=C(c1ccc(OCCN2CCCCC2)cc1)c1c(-c2ccc(O)cc2)sc2cc(O)ccc12",
702
+ "chembl_id": "CHEMBL1116",
703
+ "targets": "Estrogen receptor beta"
704
+ },
705
+ {
706
+ "name": "ESTRONE",
707
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1CCC2=O",
708
+ "chembl_id": "CHEMBL1405",
709
+ "targets": "Estrogen receptor alpha"
710
+ },
711
+ {
712
+ "name": "ESTRADIOL VALERATE",
713
+ "smiles": "CCCCC(=O)O[C@H]1CC[C@H]2[C@@H]3CCc4cc(O)ccc4[C@H]3CC[C@]12C",
714
+ "chembl_id": "CHEMBL1511",
715
+ "targets": "Estrogen receptor alpha"
716
+ },
717
+ {
718
+ "name": "ESTRADIOL ACETATE",
719
+ "smiles": "CC(=O)Oc1ccc2c(c1)CC[C@@H]1[C@@H]2CC[C@]2(C)[C@@H](O)CC[C@@H]12",
720
+ "chembl_id": "CHEMBL1200430",
721
+ "targets": "Estrogen receptor alpha"
722
+ },
723
+ {
724
+ "name": "CHLOROTRIANISENE",
725
+ "smiles": "COc1ccc(C(Cl)=C(c2ccc(OC)cc2)c2ccc(OC)cc2)cc1",
726
+ "chembl_id": "CHEMBL1200761",
727
+ "targets": "Estrogen receptor beta"
728
+ },
729
+ {
730
+ "name": "ESTRADIOL CYPIONATE",
731
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(O)cc4CC[C@H]3[C@@H]1CC[C@@H]2OC(=O)CCC1CCCC1",
732
+ "chembl_id": "CHEMBL1200973",
733
+ "targets": "Estrogen receptor alpha"
734
+ },
735
+ {
736
+ "name": "ESTROPIPATE",
737
+ "smiles": "C1CNCCN1.C[C@]12CC[C@@H]3c4ccc(OS(=O)(=O)O)cc4CC[C@H]3[C@@H]1CCC2=O",
738
+ "chembl_id": "CHEMBL1200980",
739
+ "targets": "Estrogen receptor alpha"
740
+ },
741
+ {
742
+ "name": "MESTRANOL",
743
+ "smiles": "C#C[C@]1(O)CC[C@H]2[C@@H]3CCc4cc(OC)ccc4[C@H]3CC[C@@]21C",
744
+ "chembl_id": "CHEMBL1201151",
745
+ "targets": "Estrogen receptor alpha"
746
+ },
747
+ {
748
+ "name": "CLOMIPHENE CITRATE",
749
+ "smiles": "CCN(CC)CCOc1ccc(C(=C(Cl)c2ccccc2)c2ccccc2)cc1.O=C(O)CC(O)(CC(=O)O)C(=O)O",
750
+ "chembl_id": "CHEMBL3185958",
751
+ "targets": "Estrogen receptor alpha"
752
+ },
753
+ {
754
+ "name": "ELACESTRANT",
755
+ "smiles": "CCNCCc1ccc(CN(CC)c2cc(OC)ccc2[C@@H]2CCc3cc(O)ccc3C2)cc1",
756
+ "chembl_id": "CHEMBL4297509",
757
+ "targets": "Estrogen receptor alpha"
758
+ },
759
+ {
760
+ "name": "GIREDESTRANT",
761
+ "smiles": "C[C@@H]1Cc2c([nH]c3ccccc23)[C@@H](c2c(F)cc(NC3CN(CCCF)C3)cc2F)N1CC(F)(F)CO",
762
+ "chembl_id": "CHEMBL4650316",
763
+ "targets": "Estrogen receptor alpha"
764
+ },
765
+ {
766
+ "name": "CAMIZESTRANT",
767
+ "smiles": "C[C@@H]1Cc2c(ccc3[nH]ncc23)[C@@H](c2ccc(NC3CN(CCCF)C3)cn2)N1CC(F)(F)F",
768
+ "chembl_id": "CHEMBL4650365",
769
+ "targets": "Estrogen receptor alpha"
770
+ },
771
+ {
772
+ "name": "ESTRAMUSTINE PHOSPHATE SODIUM",
773
+ "smiles": "C[C@]12CC[C@@H]3c4ccc(OC(=O)N(CCCl)CCCl)cc4CC[C@H]3[C@@H]1CC[C@@H]2OP(=O)([O-])[O-].[Na+].[Na+]",
774
+ "chembl_id": "CHEMBL1200721",
775
+ "targets": "DNA,Estrogen receptor beta"
776
+ }
777
+ ],
778
+ "NR-AhR": [
779
+ {
780
+ "name": "TAPINAROF",
781
+ "smiles": "CC(C)c1c(O)cc(/C=C/c2ccccc2)cc1O",
782
+ "chembl_id": "CHEMBL259571",
783
+ "targets": "Aryl hydrocarbon receptor"
784
+ }
785
+ ],
786
+ "NR-PPAR-gamma": [
787
+ {
788
+ "name": "SEMAGACESTAT",
789
+ "smiles": "CC(C)[C@H](O)C(=O)N[C@@H](C)C(=O)N[C@@H]1C(=O)N(C)CCc2ccccc21",
790
+ "chembl_id": "CHEMBL520733",
791
+ "targets": "Gamma-secretase"
792
+ },
793
+ {
794
+ "name": "TARENFLURBIL",
795
+ "smiles": "C[C@@H](C(=O)O)c1ccc(-c2ccccc2)c(F)c1",
796
+ "chembl_id": "CHEMBL190083",
797
+ "targets": "Gamma-secretase"
798
+ },
799
+ {
800
+ "name": "NIROGACESTAT",
801
+ "smiles": "CCC[C@H](N[C@H]1CCc2cc(F)cc(F)c2C1)C(=O)Nc1cn(C(C)(C)CNCC(C)(C)C)cn1",
802
+ "chembl_id": "CHEMBL1770916",
803
+ "targets": "Gamma-secretase"
804
+ },
805
+ {
806
+ "name": "PALOVAROTENE",
807
+ "smiles": "CC1(C)CCC(C)(C)c2cc(Cn3cccn3)c(/C=C/c3ccc(C(=O)O)cc3)cc21",
808
+ "chembl_id": "CHEMBL2105648",
809
+ "targets": "Retinoic acid receptor gamma"
810
+ },
811
+ {
812
+ "name": "TROGLITAZONE",
813
+ "smiles": "Cc1c(C)c2c(c(C)c1O)CCC(C)(COc1ccc(CC3SC(=O)NC3=O)cc1)O2",
814
+ "chembl_id": "CHEMBL408",
815
+ "targets": "Peroxisome proliferator-activated receptor gamma"
816
+ },
817
+ {
818
+ "name": "PIOGLITAZONE HYDROCHLORIDE",
819
+ "smiles": "CCc1ccc(CCOc2ccc(CC3SC(=O)NC3=O)cc2)nc1.Cl",
820
+ "chembl_id": "CHEMBL1715",
821
+ "targets": "Peroxisome proliferator-activated receptor gamma"
822
+ },
823
+ {
824
+ "name": "RIVOGLITAZONE",
825
+ "smiles": "COc1ccc2nc(COc3ccc(CC4SC(=O)NC4=O)cc3)n(C)c2c1",
826
+ "chembl_id": "CHEMBL2104753",
827
+ "targets": "Peroxisome proliferator-activated receptor gamma"
828
+ },
829
+ {
830
+ "name": "TRIFAROTENE",
831
+ "smiles": "CC(C)(C)c1cc(-c2cc(-c3ccc(C(=O)O)cc3)ccc2OCCO)ccc1N1CCCC1",
832
+ "chembl_id": "CHEMBL3707313",
833
+ "targets": "Retinoic acid receptor gamma"
834
+ },
835
+ {
836
+ "name": "BEZAFIBRATE",
837
+ "smiles": "CC(C)(Oc1ccc(CCNC(=O)c2ccc(Cl)cc2)cc1)C(=O)O",
838
+ "chembl_id": "CHEMBL264374",
839
+ "targets": "Peroxisome proliferator-activated receptor"
840
+ },
841
+ {
842
+ "name": "MURAGLITAZAR",
843
+ "smiles": "COc1ccc(OC(=O)N(CC(=O)O)Cc2ccc(OCCc3nc(-c4ccccc4)oc3C)cc2)cc1",
844
+ "chembl_id": "CHEMBL186179",
845
+ "targets": "Peroxisome proliferator-activated receptor alpha,Peroxisome proliferator-activated receptor gamma"
846
+ },
847
+ {
848
+ "name": "LANIFIBRANOR",
849
+ "smiles": "O=C(O)CCCc1cc2cc(Cl)ccc2n1S(=O)(=O)c1ccc2ncsc2c1",
850
+ "chembl_id": "CHEMBL4091374",
851
+ "targets": "Peroxisome proliferator-activated receptor"
852
+ },
853
+ {
854
+ "name": "CHIGLITAZAR",
855
+ "smiles": "O=C(c1ccc(F)cc1)c1ccccc1N[C@@H](Cc1ccc(OCCn2c3ccccc3c3ccccc32)cc1)C(=O)O",
856
+ "chembl_id": "CHEMBL4650349",
857
+ "targets": "Peroxisome proliferator-activated receptor"
858
+ },
859
+ {
860
+ "name": "ALEGLITAZAR",
861
+ "smiles": "CO[C@@H](Cc1ccc(OCCc2nc(-c3ccccc3)oc2C)c2ccsc12)C(=O)O",
862
+ "chembl_id": "CHEMBL519504",
863
+ "targets": "Peroxisome proliferator-activated receptor alpha,Peroxisome proliferator-activated receptor gamma"
864
+ },
865
+ {
866
+ "name": "IMIGLITAZAR",
867
+ "smiles": "Cc1oc(-c2ccccc2)nc1COc1ccc(CO/N=C(\\CCC(=O)O)c2ccccc2)cc1",
868
+ "chembl_id": "CHEMBL592054",
869
+ "targets": "Peroxisome proliferator-activated receptor alpha,Peroxisome proliferator-activated receptor gamma"
870
+ },
871
+ {
872
+ "name": "ELAFIBRANOR",
873
+ "smiles": "CSc1ccc(C(=O)/C=C/c2cc(C)c(OC(C)(C)C(=O)O)c(C)c2)cc1",
874
+ "chembl_id": "CHEMBL3707395",
875
+ "targets": "Peroxisome proliferator-activated receptor alpha,Peroxisome proliferator-activated receptor delta"
876
+ },
877
+ {
878
+ "name": "MK-0767",
879
+ "smiles": "COc1ccc(CC2OC(=O)NC2=O)cc1C(=O)NCc1ccc(C(F)(F)F)cc1",
880
+ "chembl_id": "CHEMBL4297404",
881
+ "targets": "Peroxisome proliferator-activated receptor alpha,Peroxisome proliferator-activated receptor gamma"
882
+ },
883
+ {
884
+ "name": "FENOFIBRIC ACID",
885
+ "smiles": "CC(C)(Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1)C(=O)O",
886
+ "chembl_id": "CHEMBL981",
887
+ "targets": "Peroxisome proliferator-activated receptor alpha"
888
+ },
889
+ {
890
+ "name": "PEMAFIBRATE",
891
+ "smiles": "CC[C@@H](Oc1cccc(CN(CCCOc2ccc(OC)cc2)c2nc3ccccc3o2)c1)C(=O)O",
892
+ "chembl_id": "CHEMBL247951",
893
+ "targets": "Peroxisome proliferator-activated receptor alpha"
894
+ },
895
+ {
896
+ "name": "TESAGLITAZAR",
897
+ "smiles": "CCO[C@@H](Cc1ccc(OCCc2ccc(OS(C)(=O)=O)cc2)cc1)C(=O)O",
898
+ "chembl_id": "CHEMBL282686",
899
+ "targets": "Peroxisome proliferator-activated receptor alpha,Peroxisome proliferator-activated receptor gamma"
900
+ },
901
+ {
902
+ "name": "SAROGLITAZAR",
903
+ "smiles": "CCO[C@@H](Cc1ccc(OCCn2c(C)ccc2-c2ccc(SC)cc2)cc1)C(=O)O",
904
+ "chembl_id": "CHEMBL4297530",
905
+ "targets": "Peroxisome proliferator-activated receptor alpha,Peroxisome proliferator-activated receptor gamma"
906
+ },
907
+ {
908
+ "name": "CIPROFIBRATE",
909
+ "smiles": "CC(C)(Oc1ccc(C2CC2(Cl)Cl)cc1)C(=O)O",
910
+ "chembl_id": "CHEMBL557555",
911
+ "targets": "Peroxisome proliferator-activated receptor alpha"
912
+ },
913
+ {
914
+ "name": "FONADELPAR",
915
+ "smiles": "Cc1cc2c(CCc3sc(-c4ccc(C(F)(F)F)cc4)nc3C(C)C)noc2cc1OCC(=O)O",
916
+ "chembl_id": "CHEMBL3545186",
917
+ "targets": "Peroxisome proliferator-activated receptor delta"
918
+ },
919
+ {
920
+ "name": "GEMFIBROZIL",
921
+ "smiles": "Cc1ccc(C)c(OCCCC(C)(C)C(=O)O)c1",
922
+ "chembl_id": "CHEMBL457",
923
+ "targets": "Peroxisome proliferator-activated receptor alpha"
924
+ },
925
+ {
926
+ "name": "CLOFIBRATE",
927
+ "smiles": "CCOC(=O)C(C)(C)Oc1ccc(Cl)cc1",
928
+ "chembl_id": "CHEMBL565",
929
+ "targets": "Peroxisome proliferator-activated receptor alpha"
930
+ },
931
+ {
932
+ "name": "FENOFIBRATE",
933
+ "smiles": "CC(C)OC(=O)C(C)(C)Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1",
934
+ "chembl_id": "CHEMBL672",
935
+ "targets": "Peroxisome proliferator-activated receptor alpha"
936
+ },
937
+ {
938
+ "name": "ROSIGLITAZONE MALEATE",
939
+ "smiles": "CN(CCOc1ccc(CC2SC(=O)NC2=O)cc1)c1ccccn1.O=C(O)/C=C\\C(=O)O",
940
+ "chembl_id": "CHEMBL843",
941
+ "targets": "Peroxisome proliferator-activated receptor gamma"
942
+ },
943
+ {
944
+ "name": "LERIGLITAZONE",
945
+ "smiles": "CC(O)c1ccc(CCOc2ccc(CC3SC(=O)NC3=O)cc2)nc1",
946
+ "chembl_id": "CHEMBL1267",
947
+ "targets": "Peroxisome proliferator-activated receptor gamma"
948
+ },
949
+ {
950
+ "name": "SELADELPAR",
951
+ "smiles": "CCO[C@H](COc1ccc(C(F)(F)F)cc1)CSc1ccc(OCC(=O)O)c(C)c1",
952
+ "chembl_id": "CHEMBL230158",
953
+ "targets": "Peroxisome proliferator-activated receptor delta"
954
+ },
955
+ {
956
+ "name": "CHOLINE FENOFIBRATE",
957
+ "smiles": "CC(C)(Oc1ccc(C(=O)c2ccc(Cl)cc2)cc1)C(=O)[O-].C[N+](C)(C)CCO",
958
+ "chembl_id": "CHEMBL1201745",
959
+ "targets": "Peroxisome proliferator-activated receptor alpha"
960
+ },
961
+ {
962
+ "name": "BALAGLITAZONE",
963
+ "smiles": "Cn1c(COc2ccc(CC3SC(=O)NC3=O)cc2)nc2ccccc2c1=O",
964
+ "chembl_id": "CHEMBL2103991",
965
+ "targets": "Peroxisome proliferator-activated receptor gamma"
966
+ },
967
+ {
968
+ "name": "BALSALAZIDE DISODIUM",
969
+ "smiles": "O=C([O-])CCNC(=O)c1ccc(/N=N/c2ccc(O)c(C(=O)[O-])c2)cc1.[Na+].[Na+]",
970
+ "chembl_id": "CHEMBL1200760",
971
+ "targets": "Arachidonate 5-lipoxygenase,Cyclooxygenase,Peroxisome proliferator-activated receptor gamma"
972
+ },
973
+ {
974
+ "name": "MESALAMINE",
975
+ "smiles": "Nc1ccc(O)c(C(=O)O)c1",
976
+ "chembl_id": "CHEMBL704",
977
+ "targets": "Arachidonate 5-lipoxygenase,Cyclooxygenase,Peroxisome proliferator-activated receptor gamma"
978
+ },
979
+ {
980
+ "name": "OLSALAZINE SODIUM",
981
+ "smiles": "O=C([O-])c1cc(/N=N/c2ccc(O)c(C(=O)[O-])c2)ccc1O.[Na+].[Na+]",
982
+ "chembl_id": "CHEMBL1201013",
983
+ "targets": "Arachidonate 5-lipoxygenase,Cyclooxygenase,Peroxisome proliferator-activated receptor gamma"
984
+ },
985
+ {
986
+ "name": "BARDOXOLONE METHYL",
987
+ "smiles": "COC(=O)[C@]12CCC(C)(C)C[C@H]1[C@H]1C(=O)C=C3[C@@]4(C)C=C(C#N)C(=O)C(C)(C)[C@@H]4CC[C@@]3(C)[C@]1(C)CC2",
988
+ "chembl_id": "CHEMBL1762621",
989
+ "targets": "Inhibitor of nuclear factor kappa B kinase beta subunit,Keap1/Nrf2,Peroxisome proliferator-activated receptor gamma"
990
+ }
991
+ ],
992
+ "SR-p53": [
993
+ {
994
+ "name": "EPRENETAPOPT",
995
+ "smiles": "COCC1(CO)C(=O)C2CCN1CC2",
996
+ "chembl_id": "CHEMBL3186011",
997
+ "targets": "Cellular tumor antigen p53"
998
+ },
999
+ {
1000
+ "name": "IDASANUTLIN",
1001
+ "smiles": "COc1cc(C(=O)O)ccc1NC(=O)[C@@H]1N[C@@H](CC(C)(C)C)[C@](C#N)(c2ccc(Cl)cc2F)[C@H]1c1cccc(Cl)c1F",
1002
+ "chembl_id": "CHEMBL2402737",
1003
+ "targets": "Tumour suppressor p53/oncoprotein Mdm2"
1004
+ }
1005
+ ],
1006
+ "SR-MMP": [
1007
+ {
1008
+ "name": "TOPOTECAN HYDROCHLORIDE",
1009
+ "smiles": "CC[C@@]1(O)C(=O)OCc2c1cc1n(c2=O)Cc2cc3c(CN(C)C)c(O)ccc3nc2-1.Cl",
1010
+ "chembl_id": "CHEMBL1607",
1011
+ "targets": "DNA topoisomerase I, mitochondrial"
1012
+ },
1013
+ {
1014
+ "name": "CARGLUMIC ACID",
1015
+ "smiles": "NC(=O)N[C@@H](CCC(=O)O)C(=O)O",
1016
+ "chembl_id": "CHEMBL1201780",
1017
+ "targets": "Carbamoyl-phosphate synthase [ammonia], mitochondrial"
1018
+ },
1019
+ {
1020
+ "name": "ENASIDENIB MESYLATE",
1021
+ "smiles": "CC(C)(O)CNc1nc(Nc2ccnc(C(F)(F)F)c2)nc(-c2cccc(C(F)(F)F)n2)n1.CS(=O)(=O)O",
1022
+ "chembl_id": "CHEMBL3989931",
1023
+ "targets": "Isocitrate dehydrogenase [NADP], mitochondrial"
1024
+ },
1025
+ {
1026
+ "name": "OLOROFIM",
1027
+ "smiles": "Cc1cc(-c2ccccc2)c(C(=O)C(=O)Nc2ccc(N3CCN(c4ncc(F)cn4)CC3)cc2)n1C",
1028
+ "chembl_id": "CHEMBL4297609",
1029
+ "targets": "Dihydroorotate dehydrogenase (quinone), mitochondrial"
1030
+ },
1031
+ {
1032
+ "name": "METFORMIN HYDROCHLORIDE",
1033
+ "smiles": "CN(C)C(=N)NC(=N)N.Cl",
1034
+ "chembl_id": "CHEMBL1703",
1035
+ "targets": "Mitochondrial complex I (NADH dehydrogenase),Mitochondrial glycerol-3-phosphate dehydrogenase"
1036
+ }
1037
+ ],
1038
+ "SR-HSE": [
1039
+ {
1040
+ "name": "RETASPIMYCIN HYDROCHLORIDE",
1041
+ "smiles": "C=CCNc1c(O)cc2c(O)c1C[C@@H](C)C[C@H](OC)[C@H](O)[C@@H](C)/C=C(\\C)[C@H](OC(N)=O)[C@@H](OC)/C=C\\C=C(/C)C(=O)N2.Cl",
1042
+ "chembl_id": "CHEMBL377559",
1043
+ "targets": "Heat shock protein HSP90"
1044
+ },
1045
+ {
1046
+ "name": "TANESPIMYCIN",
1047
+ "smiles": "C=CCNC1=C2C[C@@H](C)C[C@H](OC)[C@H](O)[C@@H](C)/C=C(\\C)[C@H](OC(N)=O)[C@@H](OC)/C=C\\C=C(/C)C(=O)NC(=CC1=O)C2=O",
1048
+ "chembl_id": "CHEMBL109480",
1049
+ "targets": "Heat shock protein HSP90"
1050
+ },
1051
+ {
1052
+ "name": "GANETESPIB",
1053
+ "smiles": "CC(C)c1cc(-c2n[nH]c(=O)n2-c2ccc3c(ccn3C)c2)c(O)cc1O",
1054
+ "chembl_id": "CHEMBL2103879",
1055
+ "targets": "Heat shock protein HSP90"
1056
+ },
1057
+ {
1058
+ "name": "FORIGERIMOD ACETATE",
1059
+ "smiles": "CC(=O)O.CC[C@H](C)[C@H](NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](C)NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)CNC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCCCN)NC(=O)CNC(=O)[C@H](COP(=O)(O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CO)NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@@H](NC(=O)[C@H](CCSC)NC(=O)[C@H](Cc1c[nH]cn1)NC(=O)[C@@H](NC(=O)[C@@H](N)CCCN=C(N)N)[C@@H](C)CC)C(C)C)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)O",
1060
+ "chembl_id": "CHEMBL3989801",
1061
+ "targets": "Heat shock cognate 71 kDa protein"
1062
+ },
1063
+ {
1064
+ "name": "PLECANATIDE",
1065
+ "smiles": "CC(C)C[C@H](NC(=O)[C@@H]1CSSC[C@@H]2NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@@H](N)CC(N)=O)CSSC[C@H](NC(=O)[C@H](C)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](C(C)C)NC2=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N1)C(=O)O",
1066
+ "chembl_id": "CHEMBL2103867",
1067
+ "targets": "Heat-stable enterotoxin receptor"
1068
+ },
1069
+ {
1070
+ "name": "LINACLOTIDE",
1071
+ "smiles": "C[C@@H]1NC(=O)[C@@H]2CCCN2C(=O)[C@H](CC(N)=O)NC(=O)[C@@H]2CSSC[C@H](N)C(=O)N[C@H]3CSSC[C@H](NC1=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@H](C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)O)CSSC[C@H](NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CCC(=O)O)NC3=O)C(=O)N2",
1072
+ "chembl_id": "CHEMBL3301675",
1073
+ "targets": "Heat-stable enterotoxin receptor"
1074
+ }
1075
+ ],
1076
+ "SR-ATAD5": [
1077
+ {
1078
+ "name": "PENTAMIDINE ISETHIONATE",
1079
+ "smiles": "N=C(N)c1ccc(OCCCCCOc2ccc(C(=N)N)cc2)cc1.O=S(=O)(O)CCO.O=S(=O)(O)CCO",
1080
+ "chembl_id": "CHEMBL361506",
1081
+ "targets": "DNA,Kinetoplast DNA"
1082
+ },
1083
+ {
1084
+ "name": "DACARBAZINE",
1085
+ "smiles": "CN(C)/N=N/c1[nH]cnc1C(N)=O",
1086
+ "chembl_id": "CHEMBL476",
1087
+ "targets": "DNA"
1088
+ },
1089
+ {
1090
+ "name": "LOMUSTINE",
1091
+ "smiles": "O=NN(CCCl)C(=O)NC1CCCCC1",
1092
+ "chembl_id": "CHEMBL514",
1093
+ "targets": "DNA"
1094
+ },
1095
+ {
1096
+ "name": "NITROFURANTOIN",
1097
+ "smiles": "O=C1CN(/N=C/c2ccc([N+](=O)[O-])o2)C(=O)N1",
1098
+ "chembl_id": "CHEMBL572",
1099
+ "targets": "DNA"
1100
+ },
1101
+ {
1102
+ "name": "IDOXURIDINE",
1103
+ "smiles": "O=c1[nH]c(=O)n([C@H]2C[C@H](O)[C@@H](CO)O2)cc1I",
1104
+ "chembl_id": "CHEMBL788",
1105
+ "targets": "DNA"
1106
+ },
1107
+ {
1108
+ "name": "BUSULFAN",
1109
+ "smiles": "CS(=O)(=O)OCCCCOS(C)(=O)=O",
1110
+ "chembl_id": "CHEMBL820",
1111
+ "targets": "DNA"
1112
+ },
1113
+ {
1114
+ "name": "PALIFOSFAMIDE",
1115
+ "smiles": "O=P(O)(NCCCl)NCCCl",
1116
+ "chembl_id": "CHEMBL889",
1117
+ "targets": "DNA"
1118
+ },
1119
+ {
1120
+ "name": "TIRAPAZAMINE",
1121
+ "smiles": "Nc1n[n+]([O-])c2ccccc2[n+]1[O-]",
1122
+ "chembl_id": "CHEMBL50882",
1123
+ "targets": "DNA"
1124
+ },
1125
+ {
1126
+ "name": "IFOSFAMIDE",
1127
+ "smiles": "O=P1(NCCCl)OCCCN1CCCl",
1128
+ "chembl_id": "CHEMBL1024",
1129
+ "targets": "DNA"
1130
+ },
1131
+ {
1132
+ "name": "FURAZOLIDONE",
1133
+ "smiles": "O=C1OCCN1/N=C/c1ccc([N+](=O)[O-])o1",
1134
+ "chembl_id": "CHEMBL1103",
1135
+ "targets": "DNA"
1136
+ },
1137
+ {
1138
+ "name": "CLOFAZIMINE",
1139
+ "smiles": "CC(C)/N=c1\\cc2n(-c3ccc(Cl)cc3)c3ccccc3nc-2cc1Nc1ccc(Cl)cc1",
1140
+ "chembl_id": "CHEMBL1292",
1141
+ "targets": "DNA"
1142
+ },
1143
+ {
1144
+ "name": "ALTRETAMINE",
1145
+ "smiles": "CN(C)c1nc(N(C)C)nc(N(C)C)n1",
1146
+ "chembl_id": "CHEMBL1455",
1147
+ "targets": "DNA"
1148
+ },
1149
+ {
1150
+ "name": "TRIOXSALEN",
1151
+ "smiles": "Cc1cc2cc3c(C)cc(=O)oc3c(C)c2o1",
1152
+ "chembl_id": "CHEMBL1475",
1153
+ "targets": "DNA"
1154
+ },
1155
+ {
1156
+ "name": "URACIL MUSTARD",
1157
+ "smiles": "Oc1ncc(N(CCCl)CCCl)c(O)n1",
1158
+ "chembl_id": "CHEMBL1488",
1159
+ "targets": "DNA"
1160
+ },
1161
+ {
1162
+ "name": "DACTINOMYCIN",
1163
+ "smiles": "Cc1c2oc3c(C)ccc(C(=O)N[C@@H]4C(=O)N[C@H](C(C)C)C(=O)N5CCC[C@H]5C(=O)N(C)CC(=O)N(C)[C@@H](C(C)C)C(=O)O[C@@H]4C)c3nc-2c(C(=O)N[C@@H]2C(=O)N[C@H](C(C)C)C(=O)N3CCC[C@H]3C(=O)N(C)CC(=O)N(C)[C@@H](C(C)C)C(=O)O[C@@H]2C)c(N)c1=O",
1164
+ "chembl_id": "CHEMBL1554",
1165
+ "targets": "DNA"
1166
+ },
1167
+ {
1168
+ "name": "CLADRIBINE",
1169
+ "smiles": "Nc1nc(Cl)nc2c1ncn2[C@H]1C[C@H](O)[C@@H](CO)O1",
1170
+ "chembl_id": "CHEMBL1619",
1171
+ "targets": "DNA"
1172
+ },
1173
+ {
1174
+ "name": "FOTEMUSTINE",
1175
+ "smiles": "CCOP(=O)(OCC)C(C)NC(=O)N(CCCl)N=O",
1176
+ "chembl_id": "CHEMBL549386",
1177
+ "targets": "DNA"
1178
+ },
1179
+ {
1180
+ "name": "CHLOROXINE",
1181
+ "smiles": "Oc1c(Cl)cc(Cl)c2cccnc12",
1182
+ "chembl_id": "CHEMBL1200596",
1183
+ "targets": "DNA"
1184
+ },
1185
+ {
1186
+ "name": "METRONIDAZOLE HYDROCHLORIDE",
1187
+ "smiles": "Cc1ncc([N+](=O)[O-])n1CCO.Cl",
1188
+ "chembl_id": "CHEMBL1200869",
1189
+ "targets": "DNA"
1190
+ },
1191
+ {
1192
+ "name": "METHYL AMINOLEVULINATE HYDROCHLORIDE",
1193
+ "smiles": "COC(=O)CCC(=O)CN.Cl",
1194
+ "chembl_id": "CHEMBL1201093",
1195
+ "targets": "DNA"
1196
+ },
1197
+ {
1198
+ "name": "NELARABINE",
1199
+ "smiles": "COc1nc(N)nc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O",
1200
+ "chembl_id": "CHEMBL1201112",
1201
+ "targets": "DNA"
1202
+ },
1203
+ {
1204
+ "name": "PIXANTRONE DIMALEATE",
1205
+ "smiles": "NCCNc1ccc(NCCN)c2c1C(=O)c1ccncc1C2=O.O=C(O)/C=C\\C(=O)O.O=C(O)/C=C\\C(=O)O",
1206
+ "chembl_id": "CHEMBL2103844",
1207
+ "targets": "DNA"
1208
+ },
1209
+ {
1210
+ "name": "SAPACITABINE",
1211
+ "smiles": "CCCCCCCCCCCCCCCC(=O)Nc1ccn([C@@H]2O[C@H](CO)[C@@H](O)[C@@H]2C#N)c(=O)n1",
1212
+ "chembl_id": "CHEMBL2105681",
1213
+ "targets": "DNA"
1214
+ },
1215
+ {
1216
+ "name": "MELPHALAN FLUFENAMIDE HYDROCHLORIDE",
1217
+ "smiles": "CCOC(=O)[C@H](Cc1ccc(F)cc1)NC(=O)[C@@H](N)Cc1ccc(N(CCCl)CCCl)cc1.Cl",
1218
+ "chembl_id": "CHEMBL4297403",
1219
+ "targets": "DNA"
1220
+ },
1221
+ {
1222
+ "name": "VOSAROXIN",
1223
+ "smiles": "CN[C@H]1CN(c2ccc3c(=O)c(C(=O)O)cn(-c4nccs4)c3n2)C[C@@H]1OC",
1224
+ "chembl_id": "CHEMBL68117",
1225
+ "targets": "DNA,DNA topoisomerase II"
1226
+ }
1227
+ ]
1228
+ }
data/toxicophores_validated.json ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ fastapi>=0.100.0
2
+ uvicorn>=0.22.0
3
+ pydantic>=2.0.0
4
+ numpy>=1.24.0
5
+ torch>=2.0.0
6
+ rdkit>=2023.3.1
7
+ scikit-learn>=1.3.0
src/__init__.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ from .model import Tox21SNN
2
+ from .features import EnhancedFeatureExtractor, TOX21_TARGETS
3
+ from .ensemble import Tox21Ensemble
4
+
5
+ TASKS = TOX21_TARGETS
6
+
7
+ __all__ = ["Tox21SNN", "EnhancedFeatureExtractor", "Tox21Ensemble", "TASKS"]
src/ensemble.py ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import torch
3
+ from pathlib import Path
4
+ from .model import Tox21SNN
5
+
6
+ ECFP_END = 8192
7
+ MACCS_END = ECFP_END + 167
8
+ RDKIT_END = MACCS_END + 208
9
+ TOX_END = RDKIT_END + 1868
10
+
11
+
12
+ class FoldPredictor:
13
+
14
+ def __init__(self, fold_data, device):
15
+ self.device = device
16
+ self.ecfp_indices = fold_data["ecfp_indices"]
17
+ self.tox_indices = fold_data["tox_indices"]
18
+ self.in_features = fold_data["in_features"]
19
+
20
+ scaler = fold_data["scaler_state"]
21
+ self.s1_mean = np.array(scaler["scaler1_mean"], dtype=np.float32)
22
+ self.s1_scale = np.array(scaler["scaler1_scale"], dtype=np.float32)
23
+ self.s2_mean = np.array(scaler["scaler2_mean"], dtype=np.float32)
24
+ self.s2_scale = np.array(scaler["scaler2_scale"], dtype=np.float32)
25
+
26
+ self.model = Tox21SNN(in_features=self.in_features, dropout=0.0)
27
+ self.model.load_state_dict(fold_data["model_state"])
28
+ self.model.to(device)
29
+ self.model.eval()
30
+
31
+ def _select_features(self, X):
32
+ return np.concatenate([
33
+ X[:, :ECFP_END][:, self.ecfp_indices],
34
+ X[:, ECFP_END:MACCS_END],
35
+ X[:, MACCS_END:RDKIT_END],
36
+ X[:, RDKIT_END:TOX_END][:, self.tox_indices],
37
+ X[:, TOX_END:]
38
+ ], axis=1)
39
+
40
+ def _scale(self, X):
41
+ X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0)
42
+ X = (X - self.s1_mean) / np.clip(self.s1_scale, 1e-10, None)
43
+ X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0)
44
+ X = np.tanh(X)
45
+ X = (X - self.s2_mean) / np.clip(self.s2_scale, 1e-10, None)
46
+ return X
47
+
48
+ @torch.no_grad()
49
+ def predict(self, X_raw):
50
+ X = self._select_features(X_raw)
51
+ X = self._scale(X)
52
+ X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0)
53
+ tensor = torch.tensor(X, dtype=torch.float32, device=self.device)
54
+ logits = self.model(tensor)
55
+ return torch.sigmoid(logits).cpu().numpy()
56
+
57
+
58
+ class Tox21Ensemble:
59
+
60
+ def __init__(self, checkpoint_path, device=None):
61
+ self.device = device or torch.device("cuda" if torch.cuda.is_available() else "cpu")
62
+ self.predictors = []
63
+
64
+ checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)
65
+ self.n_folds = checkpoint["n_folds"]
66
+ self.mean_auc = checkpoint["mean_auc"]
67
+
68
+ for fold_data in checkpoint["folds"]:
69
+ predictor = FoldPredictor(fold_data, self.device)
70
+ self.predictors.append(predictor)
71
+
72
+ @torch.no_grad()
73
+ def predict(self, X_raw):
74
+ predictions = []
75
+ for predictor in self.predictors:
76
+ pred = predictor.predict(X_raw)
77
+ predictions.append(pred)
78
+ return np.mean(predictions, axis=0)
src/features.py ADDED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import numpy as np
3
+ from rdkit import Chem, DataStructs
4
+ from rdkit.Chem import AllChem, Descriptors, MACCSkeys
5
+ from rdkit.Chem import rdFingerprintGenerator
6
+ from rdkit.Chem.FilterCatalog import FilterCatalog, FilterCatalogParams
7
+ from rdkit.Chem.MolStandardize import rdMolStandardize
8
+
9
+
10
+ TOX21_TARGETS = [
11
+ "NR-AR", "NR-AR-LBD", "NR-AhR", "NR-Aromatase", "NR-ER", "NR-ER-LBD",
12
+ "NR-PPAR-gamma", "SR-ARE", "SR-ATAD5", "SR-HSE", "SR-MMP", "SR-p53",
13
+ ]
14
+
15
+ USED_200_DESCR = [
16
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 25, 26, 27,
17
+ 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
18
+ 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
19
+ 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
20
+ 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
21
+ 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
22
+ 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,
23
+ 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141,
24
+ 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
25
+ 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
26
+ 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,
27
+ 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197,
28
+ 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
29
+ ]
30
+
31
+ REFERENCE_LIGANDS = {
32
+ "NR-AR": [
33
+ ("testosterone", "CC12CCC3C(C1CCC2O)CCC4=CC(=O)CCC34C"),
34
+ ("dihydrotestosterone", "CC12CCC3C(C1CCC2O)CCC4CC(=O)CCC34C"),
35
+ ("methyltrienolone", "CC12CCC3C(C1CCC2O)CCC4=CC(=O)C=CC34C"),
36
+ ("flutamide", "CC(C)C(=O)Nc1ccc(c(c1)C(F)(F)F)[N+](=O)[O-]"),
37
+ ("bicalutamide", "CC(CS(=O)(=O)c1ccc(F)cc1)(O)C(=O)Nc1ccc(C#N)c(c1)C(F)(F)F"),
38
+ ("enzalutamide", "CNC(=O)c1ccc(N2C(=S)N(c3ccc(C#N)c(C(F)(F)F)c3)C(=O)C2(C)C)cc1F"),
39
+ ],
40
+ "NR-AR-LBD": [
41
+ ("testosterone", "CC12CCC3C(C1CCC2O)CCC4=CC(=O)CCC34C"),
42
+ ("dihydrotestosterone", "CC12CCC3C(C1CCC2O)CCC4CC(=O)CCC34C"),
43
+ ("bicalutamide", "CC(CS(=O)(=O)c1ccc(F)cc1)(O)C(=O)Nc1ccc(C#N)c(c1)C(F)(F)F"),
44
+ ],
45
+ "NR-AhR": [
46
+ ("tcdd", "Clc1cc2Oc3cc(Cl)c(Cl)cc3Oc2cc1Cl"),
47
+ ("benzo_a_pyrene", "c1ccc2c(c1)cc3ccc4cccc5ccc2c3c45"),
48
+ ("beta_naphthoflavone", "O=c1cc(-c2ccc3ccccc3c2)oc2ccc3ccccc3c12"),
49
+ ("indirubin", "O=C1Nc2ccccc2C1=C1C(=O)Nc2ccccc21"),
50
+ ],
51
+ "NR-Aromatase": [
52
+ ("exemestane", "CC12CCC3C(C1CC(=C)C2=O)CCC4=CC(=O)C=CC34C"),
53
+ ("letrozole", "N#Cc1ccc(Cn2cncn2)c(c1)c1ccc(C#N)cc1"),
54
+ ("anastrozole", "CC(C)(C#N)c1cc(Cn2cncn2)cc(c1)C(C)(C)C#N"),
55
+ ("androstenedione", "CC12CCC3C(C1CCC2=O)CCC4=CC(=O)CCC34C"),
56
+ ],
57
+ "NR-ER": [
58
+ ("estradiol", "CC12CCC3c4ccc(O)cc4CCC3C1CCC2O"),
59
+ ("diethylstilbestrol", "CCC(=C(CC)c1ccc(O)cc1)c1ccc(O)cc1"),
60
+ ("tamoxifen", "CCC(=C(c1ccccc1)c1ccc(OCCN(C)C)cc1)c1ccccc1"),
61
+ ("genistein", "Oc1ccc(cc1)C1=COc2cc(O)cc(O)c2C1=O"),
62
+ ("raloxifene", "Oc1ccc(cc1)c1sc2cc(O)ccc2c1C(=O)c1ccc(OCCN2CCCCC2)cc1"),
63
+ ],
64
+ "NR-ER-LBD": [
65
+ ("estradiol", "CC12CCC3c4ccc(O)cc4CCC3C1CCC2O"),
66
+ ("diethylstilbestrol", "CCC(=C(CC)c1ccc(O)cc1)c1ccc(O)cc1"),
67
+ ("raloxifene", "Oc1ccc(cc1)c1sc2cc(O)ccc2c1C(=O)c1ccc(OCCN2CCCCC2)cc1"),
68
+ ],
69
+ "NR-PPAR-gamma": [
70
+ ("rosiglitazone", "CN(CCOc1ccc(CC2SC(=O)NC2=O)cc1)c1ccccn1"),
71
+ ("pioglitazone", "CCc1ccc(CCOc2ccc(CC3SC(=O)NC3=O)cc2)nc1"),
72
+ ("troglitazone", "Cc1c(C)c2OC(C)(C)CCc2c(C)c1Oc1ccc(CC2SC(=O)NC2=O)cc1"),
73
+ ],
74
+ "SR-ARE": [
75
+ ("sulforaphane", "CS(=O)CCCCN=C=S"),
76
+ ("tert_butylhydroquinone", "CC(C)(C)c1cc(O)ccc1O"),
77
+ ("curcumin", "COc1cc(C=CC(=O)CC(=O)C=Cc2ccc(O)c(OC)c2)ccc1O"),
78
+ ],
79
+ "SR-ATAD5": [
80
+ ("camptothecin", "CCC1(O)C(=O)OCc2c1cc3n(c2=O)c1ccccc1nc3"),
81
+ ("etoposide", "COc1cc(cc(OC)c1O)C1C2C(COC2=O)C(OC2OC3COC(C)OC3C(O)C2O)c2cc3OCOc3cc12"),
82
+ ],
83
+ "SR-HSE": [
84
+ ("geldanamycin", "COC1CC(C)CC2=C(NCC=C(C)C(OC)C(C)C(OC(N)=O)C(C)C=C(C)C=C(C)C(=O)N1)C(=O)C=C(N)C2=O"),
85
+ ("ganetespib", "CC(C)c1cc(-c2n[nH]c(=O)n2-c2ccc3c(ccn3C)c2)c(O)cc1O"),
86
+ ],
87
+ "SR-MMP": [
88
+ ("cccp", "N#CC(=Cc1ccc([N+](=O)[O-])cc1)C#N"),
89
+ ("fccp", "N#CC(=Cc1ccc(cc1)C(F)(F)F)C#N"),
90
+ ("rotenone", "COc1cc2C3CC(C)OC3c3ccc4OC5OCCC5c4c3c2cc1OC"),
91
+ ("antimycin_a", "CCCCCC(C)C(OC(=O)c1ccccc1N)C(NC(=O)c1cccc(NC=O)c1O)C(C)O"),
92
+ ],
93
+ "SR-p53": [
94
+ ("nutlin_3", "COc1ccc(c(OC)c1)C1N(C(=O)C(N1c1ccc(Cl)cc1)c1ccc(Cl)cc1)C1CCNCC1"),
95
+ ("doxorubicin", "COc1cccc2c1C(=O)c1c(O)c3CC(O)(CC(OC4CC(N)C(O)C(C)O4)c3c(O)c1C2=O)C(=O)CO"),
96
+ ],
97
+ }
98
+
99
+
100
+ class EnhancedFeatureExtractor:
101
+
102
+ def __init__(
103
+ self,
104
+ toxicophores_path=None,
105
+ db_ligands_path=None,
106
+ use_rdkit_filters=True,
107
+ use_similarity=True,
108
+ use_db_ligands=True,
109
+ ecfp_radius=3,
110
+ ecfp_bits=8192,
111
+ sim_radius=2,
112
+ sim_bits=2048,
113
+ ):
114
+ self.toxicophores_path = toxicophores_path
115
+ self.db_ligands_path = db_ligands_path
116
+ self.use_rdkit_filters = use_rdkit_filters
117
+ self.use_similarity = use_similarity
118
+ self.use_db_ligands = use_db_ligands
119
+ self.ecfp_radius = ecfp_radius
120
+ self.ecfp_bits = ecfp_bits
121
+ self.sim_radius = sim_radius
122
+ self.sim_bits = sim_bits
123
+
124
+ self._toxicophore_patterns = None
125
+ self._filter_catalogs = None
126
+ self._ref_fps = None
127
+ self._db_ligand_fps = None
128
+ self._standardizer = None
129
+
130
+ def _get_standardizer(self):
131
+ if self._standardizer is None:
132
+ self._standardizer = _Standardizer()
133
+ return self._standardizer
134
+
135
+ def _load_toxicophores(self):
136
+ if self._toxicophore_patterns is None:
137
+ if self.toxicophores_path:
138
+ with open(self.toxicophores_path) as f:
139
+ data = json.load(f)
140
+ self._toxicophore_patterns = []
141
+ for name, smarts in data:
142
+ pat = Chem.MolFromSmarts(smarts)
143
+ if pat:
144
+ self._toxicophore_patterns.append((name, pat))
145
+ return self._toxicophore_patterns
146
+
147
+ def _load_filter_catalogs(self):
148
+ if self._filter_catalogs is None:
149
+ self._filter_catalogs = {}
150
+ for name, cat_type in [
151
+ ("PAINS", FilterCatalogParams.FilterCatalogs.PAINS),
152
+ ("BRENK", FilterCatalogParams.FilterCatalogs.BRENK),
153
+ ("NIH", FilterCatalogParams.FilterCatalogs.NIH),
154
+ ("ZINC", FilterCatalogParams.FilterCatalogs.ZINC),
155
+ ]:
156
+ params = FilterCatalogParams()
157
+ params.AddCatalog(cat_type)
158
+ self._filter_catalogs[name] = FilterCatalog(params)
159
+ return self._filter_catalogs
160
+
161
+ def _load_ref_fps(self):
162
+ if self._ref_fps is None:
163
+ self._ref_fps = {}
164
+ gen = rdFingerprintGenerator.GetMorganGenerator(
165
+ radius=self.sim_radius, fpSize=self.sim_bits
166
+ )
167
+ for target, ligands in REFERENCE_LIGANDS.items():
168
+ self._ref_fps[target] = []
169
+ for name, smi in ligands:
170
+ mol = Chem.MolFromSmiles(smi)
171
+ if mol:
172
+ fp = gen.GetFingerprint(mol)
173
+ self._ref_fps[target].append((name, fp))
174
+ return self._ref_fps
175
+
176
+ def _load_db_ligand_fps(self):
177
+ if self._db_ligand_fps is None and self.db_ligands_path:
178
+ with open(self.db_ligands_path) as f:
179
+ db_ligands = json.load(f)
180
+
181
+ gen = rdFingerprintGenerator.GetMorganGenerator(
182
+ radius=self.sim_radius, fpSize=self.sim_bits
183
+ )
184
+ self._db_ligand_fps = {}
185
+ for target in TOX21_TARGETS:
186
+ if target not in db_ligands:
187
+ continue
188
+ self._db_ligand_fps[target] = []
189
+ for lig in db_ligands[target][:10]:
190
+ smi = lig.get("smiles", "")
191
+ name = lig.get("name", "unknown")[:20]
192
+ mol = Chem.MolFromSmiles(smi)
193
+ if mol:
194
+ fp = gen.GetFingerprint(mol)
195
+ self._db_ligand_fps[target].append((name, fp))
196
+ return self._db_ligand_fps
197
+
198
+ def extract_features(self, smiles_list):
199
+ standardizer = self._get_standardizer()
200
+ mols = []
201
+ valid_mask = []
202
+
203
+ for smi in smiles_list:
204
+ mol = Chem.MolFromSmiles(smi)
205
+ if mol is None:
206
+ valid_mask.append(False)
207
+ continue
208
+
209
+ std_mol, _ = standardizer.standardize_mol(mol)
210
+ if std_mol is None:
211
+ valid_mask.append(False)
212
+ continue
213
+
214
+ mols.append(std_mol)
215
+ valid_mask.append(True)
216
+
217
+ valid_mask = np.array(valid_mask)
218
+ n_total = len(smiles_list)
219
+ n_valid = len(mols)
220
+
221
+ features = {}
222
+
223
+ ecfps = self._compute_ecfp(mols)
224
+ features["ecfps"] = self._fill(ecfps, valid_mask, n_total)
225
+
226
+ maccs = self._compute_maccs(mols)
227
+ features["maccs"] = self._fill(maccs, valid_mask, n_total)
228
+
229
+ rdkit_descrs = self._compute_rdkit_descriptors(mols)
230
+ features["rdkit_descrs"] = self._fill(rdkit_descrs, valid_mask, n_total)
231
+
232
+ if self.toxicophores_path:
233
+ tox = self._compute_toxicophore_features(mols)
234
+ features["tox"] = self._fill(tox, valid_mask, n_total)
235
+
236
+ if self.use_rdkit_filters:
237
+ filters = self._compute_rdkit_filter_features(mols)
238
+ features["rdkit_filters"] = self._fill(filters, valid_mask, n_total)
239
+
240
+ if self.use_similarity:
241
+ sim = self._compute_similarity_features(mols)
242
+ features["similarity"] = self._fill(sim, valid_mask, n_total)
243
+
244
+ max_sim = self._compute_max_similarity_features(mols)
245
+ features["max_similarity"] = self._fill(max_sim, valid_mask, n_total)
246
+
247
+ if self.use_db_ligands and self.db_ligands_path:
248
+ db_sim = self._compute_db_ligand_similarity(mols)
249
+ features["db_similarity"] = self._fill(db_sim, valid_mask, n_total)
250
+
251
+ return features, valid_mask
252
+
253
+ def _fill(self, features, mask, n_total):
254
+ n_features = features.shape[1] if len(features.shape) > 1 else 1
255
+ filled = np.full((n_total, n_features), np.nan, dtype=np.float32)
256
+ filled[mask] = features
257
+ return filled
258
+
259
+ def _compute_ecfp(self, mols):
260
+ ecfps = []
261
+ gen = rdFingerprintGenerator.GetMorganGenerator(
262
+ countSimulation=True, fpSize=self.ecfp_bits, radius=self.ecfp_radius
263
+ )
264
+ for mol in mols:
265
+ fp = gen.GetCountFingerprint(mol)
266
+ arr = np.zeros((self.ecfp_bits,), dtype=np.float32)
267
+ DataStructs.ConvertToNumpyArray(fp, arr)
268
+ ecfps.append(arr)
269
+ return np.array(ecfps)
270
+
271
+ def _compute_maccs(self, mols):
272
+ maccs = []
273
+ for mol in mols:
274
+ fp = MACCSkeys.GenMACCSKeys(mol)
275
+ arr = np.zeros((167,), dtype=np.float32)
276
+ DataStructs.ConvertToNumpyArray(fp, arr)
277
+ maccs.append(arr)
278
+ return np.array(maccs)
279
+
280
+ def _compute_rdkit_descriptors(self, mols):
281
+ descrs_list = []
282
+ for mol in mols:
283
+ descrs = []
284
+ for _, fn in Descriptors._descList:
285
+ try:
286
+ val = fn(mol)
287
+ if val is None or np.isnan(val) or np.isinf(val):
288
+ val = 0.0
289
+ except Exception:
290
+ val = 0.0
291
+ descrs.append(val)
292
+ descrs = np.array(descrs)[USED_200_DESCR]
293
+ descrs_list.append(descrs)
294
+ return np.array(descrs_list, dtype=np.float32)
295
+
296
+ def _compute_toxicophore_features(self, mols):
297
+ patterns = self._load_toxicophores()
298
+ features = np.zeros((len(mols), len(patterns)), dtype=np.float32)
299
+ for i, mol in enumerate(mols):
300
+ for j, (name, pat) in enumerate(patterns):
301
+ if mol.HasSubstructMatch(pat):
302
+ features[i, j] = 1.0
303
+ return features
304
+
305
+ def _compute_rdkit_filter_features(self, mols):
306
+ catalogs = self._load_filter_catalogs()
307
+ n_features = sum(cat.GetNumEntries() for cat in catalogs.values())
308
+ features = np.zeros((len(mols), n_features), dtype=np.float32)
309
+
310
+ for mol_idx, mol in enumerate(mols):
311
+ feat_idx = 0
312
+ for cat_name, catalog in catalogs.items():
313
+ for i in range(catalog.GetNumEntries()):
314
+ entry = catalog.GetEntryWithIdx(i)
315
+ if entry.HasFilterMatch(mol):
316
+ features[mol_idx, feat_idx] = 1.0
317
+ feat_idx += 1
318
+ return features
319
+
320
+ def _compute_similarity_features(self, mols):
321
+ ref_fps = self._load_ref_fps()
322
+ n_features = sum(len(fps) for fps in ref_fps.values())
323
+ features = np.zeros((len(mols), n_features), dtype=np.float32)
324
+
325
+ gen = rdFingerprintGenerator.GetMorganGenerator(
326
+ radius=self.sim_radius, fpSize=self.sim_bits
327
+ )
328
+ for mol_idx, mol in enumerate(mols):
329
+ mol_fp = gen.GetFingerprint(mol)
330
+ feat_idx = 0
331
+ for target in REFERENCE_LIGANDS.keys():
332
+ for name, ref_fp in ref_fps[target]:
333
+ features[mol_idx, feat_idx] = DataStructs.TanimotoSimilarity(
334
+ mol_fp, ref_fp
335
+ )
336
+ feat_idx += 1
337
+ return features
338
+
339
+ def _compute_max_similarity_features(self, mols):
340
+ ref_fps = self._load_ref_fps()
341
+ features = np.zeros((len(mols), len(TOX21_TARGETS)), dtype=np.float32)
342
+
343
+ gen = rdFingerprintGenerator.GetMorganGenerator(
344
+ radius=self.sim_radius, fpSize=self.sim_bits
345
+ )
346
+ for mol_idx, mol in enumerate(mols):
347
+ mol_fp = gen.GetFingerprint(mol)
348
+ for target_idx, target in enumerate(TOX21_TARGETS):
349
+ if target in ref_fps and ref_fps[target]:
350
+ sims = [
351
+ DataStructs.TanimotoSimilarity(mol_fp, fp)
352
+ for _, fp in ref_fps[target]
353
+ ]
354
+ features[mol_idx, target_idx] = max(sims)
355
+ return features
356
+
357
+ def _compute_db_ligand_similarity(self, mols):
358
+ db_fps = self._load_db_ligand_fps()
359
+ if not db_fps:
360
+ return np.zeros((len(mols), 0), dtype=np.float32)
361
+
362
+ n_features = sum(len(fps) for fps in db_fps.values())
363
+ features = np.zeros((len(mols), n_features), dtype=np.float32)
364
+
365
+ gen = rdFingerprintGenerator.GetMorganGenerator(
366
+ radius=self.sim_radius, fpSize=self.sim_bits
367
+ )
368
+ for mol_idx, mol in enumerate(mols):
369
+ mol_fp = gen.GetFingerprint(mol)
370
+ feat_idx = 0
371
+ for target in TOX21_TARGETS:
372
+ if target not in db_fps:
373
+ continue
374
+ for name, ref_fp in db_fps[target]:
375
+ features[mol_idx, feat_idx] = DataStructs.TanimotoSimilarity(
376
+ mol_fp, ref_fp
377
+ )
378
+ feat_idx += 1
379
+ return features
380
+
381
+
382
+ class _Standardizer:
383
+
384
+ def __init__(self):
385
+ self._taut_enumerator = None
386
+ self._uncharger = None
387
+ self._lfrag_chooser = None
388
+
389
+ @property
390
+ def taut_enumerator(self):
391
+ if self._taut_enumerator is None:
392
+ self._taut_enumerator = rdMolStandardize.TautomerEnumerator()
393
+ return self._taut_enumerator
394
+
395
+ @property
396
+ def uncharger(self):
397
+ if self._uncharger is None:
398
+ self._uncharger = rdMolStandardize.Uncharger()
399
+ return self._uncharger
400
+
401
+ @property
402
+ def lfrag_chooser(self):
403
+ if self._lfrag_chooser is None:
404
+ self._lfrag_chooser = rdMolStandardize.LargestFragmentChooser()
405
+ return self._lfrag_chooser
406
+
407
+ def standardize_mol(self, mol_in):
408
+ try:
409
+ params = Chem.RemoveHsParameters()
410
+ params.removeAndTrackIsotopes = True
411
+ mol = Chem.RemoveHs(mol_in, params, sanitize=False)
412
+ mol = rdMolStandardize.Cleanup(mol)
413
+ Chem.SanitizeMol(mol)
414
+ Chem.AssignStereochemistry(mol)
415
+ mol = self.lfrag_chooser.choose(mol)
416
+ mol = self.uncharger.uncharge(mol)
417
+ Chem.SanitizeMol(mol)
418
+ mol = Chem.RemoveHs(Chem.AddHs(mol))
419
+ can_smiles = Chem.MolToSmiles(mol)
420
+ return mol, can_smiles
421
+ except Exception:
422
+ return None, None
423
+
424
+
425
+ def get_feature_counts(toxicophores_path=None, db_ligands_path=None):
426
+ counts = {
427
+ "ecfps": 8192,
428
+ "maccs": 167,
429
+ "rdkit_descrs": 208,
430
+ }
431
+
432
+ if toxicophores_path:
433
+ with open(toxicophores_path) as f:
434
+ tox_data = json.load(f)
435
+ counts["tox"] = len(tox_data)
436
+
437
+ rdkit_count = 0
438
+ for cat_type in [
439
+ FilterCatalogParams.FilterCatalogs.PAINS,
440
+ FilterCatalogParams.FilterCatalogs.BRENK,
441
+ FilterCatalogParams.FilterCatalogs.NIH,
442
+ FilterCatalogParams.FilterCatalogs.ZINC,
443
+ ]:
444
+ params = FilterCatalogParams()
445
+ params.AddCatalog(cat_type)
446
+ rdkit_count += FilterCatalog(params).GetNumEntries()
447
+ counts["rdkit_filters"] = rdkit_count
448
+
449
+ counts["similarity"] = sum(len(ligs) for ligs in REFERENCE_LIGANDS.values())
450
+ counts["max_similarity"] = len(TOX21_TARGETS)
451
+
452
+ if db_ligands_path:
453
+ with open(db_ligands_path) as f:
454
+ db_ligands = json.load(f)
455
+ counts["db_similarity"] = sum(min(len(v), 10) for v in db_ligands.values())
456
+
457
+ return counts
src/model.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+
4
+
5
+ class Tox21SNN(nn.Module):
6
+
7
+ def __init__(self, in_features, hidden_dim=768, n_layers=8, dropout=0.05):
8
+ super().__init__()
9
+ self.in_features = in_features
10
+ self.hidden_dim = hidden_dim
11
+ self.n_layers = n_layers
12
+
13
+ activation = nn.SELU()
14
+ drop = nn.AlphaDropout(p=dropout)
15
+
16
+ dims = [hidden_dim] * (n_layers + 1)
17
+ dims[0] = in_features
18
+ dims[-1] = 12
19
+
20
+ layers = []
21
+ for i in range(n_layers + 1):
22
+ in_dim = dims[i]
23
+ out_dim = dims[-1] if i == n_layers else dims[i + 1]
24
+ fc = nn.Linear(in_dim, out_dim)
25
+ if i < n_layers:
26
+ layers.extend([fc, activation, drop])
27
+ else:
28
+ layers.append(fc)
29
+
30
+ self.model = nn.Sequential(*layers)
31
+
32
+ def forward(self, x):
33
+ return self.model(x)