YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

OpenADMET ExpansionRx Challenge - Methodology Report

Account Name: WakuwakuADMET


1. Model Description

  • Algorithm:

    • Molecular Graph Neural Networks

      • Chemprop (Caco-2 Permeability Papp A>B, HLMCLint, MLMCLint)
      • AttentiveFP (LogD, KSOL, Caco-2 Permeability Efflux)
      • DimeNet (MPPB, MBPB)
    • Linear Regression

      • 1D linear regression model using MBPB prediction value as an input feature (MGMB)
  • Training Strategy:

    • Single-task

2. External Data

  • External Data Sources: In-house (not public)
Endpoint External Data Use
LogD Yes
KSOL Yes
MLM CLint No
HLM CLint no
Caco-2 Permeability Efflux Yes
Caco-2 Permeability Papp A>B No
MPPB No
MBPB No
MGMB No

3. Performance Comments

  • Performance between train/validation is consistent
  • The use of external datasets was effective for LogD and Caco-2 Permeability Efflux.
  • The binding tasks (MPPB, MBPB, and MGMB) exhibit moderate to strong correlations with each other. Therefore, transfer learning for the MGMB task using the MBPB prediction value as an input feature was effective.
  • QM representation on molecular graphs and 3D graph neural networks were effective for binding tasks.

4. Ensemble Strategy

  • Aggregation Method: Simple averaging
  • Model Diversity: Different CV folds

5. Additional Features / Molecular Representations

  • Fingerprints/Descriptors: Not used
  • Learned Embeddings: QM representations on molecular graphs were obtained using AIMNet2 (MPPB, MBPB).
  • Prediction Value: MBPB prediction value was used as a input feature of MGMB prediction task due to the strong correlation between MBPB and MGMB in the training dataset.

6. Data Preprocessing

  • Target Transformation: log10 for all endpoints except LogD
  • Zero/Missing Value Handling: Replace 0 with smallest non-zero value (LogD, KSOL, Caco-2 Permeability Efflux, Caco-2 Permeability Papp A>B), Exclude zeros (HLM CLint, MLM CLint, MPPB, MBPB, MGMB)
  • SMILES Standardization: RDKit canonicalization

7. Loss Function / Validation / Split Strategy

  • Loss Type: MSE
  • Cross-Validation: 5-fold CV
  • Split Method: Random split
  • Early Stopping: Yes

8. Negative Results / What Didn't Work

  • For some endpoints (CacoA>B, MPPB, HLM CLint), our in-house data did not improve the validation R-squared — either as additional training data or as pretraining data. We suspect this may be driven by differences in wet-lab assay protocols, leading to a distribution shift between datasets.

9. References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for WakuwakuADMET/OpenADMET_ExpansionRx_Challenge_Methodology_Report