WakuwakuADMET
/

OpenADMET_ExpansionRx_Challenge_Methodology_Report

Model card Files Files and versions

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

OpenADMET ExpansionRx Challenge - Methodology Report

Account Name: WakuwakuADMET

1. Model Description

Algorithm:
- Molecular Graph Neural Networks
  - Chemprop (Caco-2 Permeability Papp A>B, HLMCLint, MLMCLint)
  - AttentiveFP (LogD, KSOL, Caco-2 Permeability Efflux)
  - DimeNet (MPPB, MBPB)
- Linear Regression
  - 1D linear regression model using MBPB prediction value as an input feature (MGMB)
Training Strategy:
- Single-task

2. External Data

External Data Sources: In-house (not public)

Endpoint	External Data Use
LogD	Yes
KSOL	Yes
MLM CLint	No
HLM CLint	no
Caco-2 Permeability Efflux	Yes
Caco-2 Permeability Papp A>B	No
MPPB	No
MBPB	No
MGMB	No

3. Performance Comments

Performance between train/validation is consistent
The use of external datasets was effective for LogD and Caco-2 Permeability Efflux.
The binding tasks (MPPB, MBPB, and MGMB) exhibit moderate to strong correlations with each other. Therefore, transfer learning for the MGMB task using the MBPB prediction value as an input feature was effective.
QM representation on molecular graphs and 3D graph neural networks were effective for binding tasks.

4. Ensemble Strategy

Aggregation Method: Simple averaging
Model Diversity: Different CV folds

5. Additional Features / Molecular Representations

Fingerprints/Descriptors: Not used
Learned Embeddings: QM representations on molecular graphs were obtained using AIMNet2 (MPPB, MBPB).
Prediction Value: MBPB prediction value was used as a input feature of MGMB prediction task due to the strong correlation between MBPB and MGMB in the training dataset.

6. Data Preprocessing

Target Transformation: log10 for all endpoints except LogD
Zero/Missing Value Handling: Replace 0 with smallest non-zero value (LogD, KSOL, Caco-2 Permeability Efflux, Caco-2 Permeability Papp A>B), Exclude zeros (HLM CLint, MLM CLint, MPPB, MBPB, MGMB)
SMILES Standardization: RDKit canonicalization

7. Loss Function / Validation / Split Strategy

Loss Type: MSE
Cross-Validation: 5-fold CV
Split Method: Random split
Early Stopping: Yes

8. Negative Results / What Didn't Work

For some endpoints (CacoA>B, MPPB, HLM CLint), our in-house data did not improve the validation R-squared — either as additional training data or as pretraining data. We suspect this may be driven by differences in wet-lab assay protocols, leading to a distribution shift between datasets.

9. References

Chemprop: https://github.com/chemprop/chemprop, https://arxiv.org/abs/2003.03123
Attentivefp: https://pubs.acs.org/doi/10.1021/acs.jmedchem.9b00959
DimeNet: https://arxiv.org/abs/2003.03123
AIMNet2: https://chemrxiv.org/engage/chemrxiv/article-details/6763b51281d2151a022fb6a5

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for WakuwakuADMET/OpenADMET_ExpansionRx_Challenge_Methodology_Report

Directional Message Passing for Molecular Graphs

Paper • 2003.03123 • Published Mar 6, 2020