Update README.md
Browse files
README.md
CHANGED
|
@@ -32,52 +32,43 @@ MolE learns task-independent molecular representations of chemicals via Graph Is
|
|
| 32 |
|
| 33 |
## Long description
|
| 34 |
|
| 35 |
-
MolE integrates molecular graph-based representation learning with gradient-boosted decision trees for predicting antimicrobial potential. The approach involves:
|
| 36 |
-
|
| 37 |
-
|
|
|
|
| 38 |
|
| 39 |
## Metadata
|
| 40 |
|
| 41 |
### Input
|
| 42 |
|
| 43 |
-
-
|
| 44 |
-
|
| 45 |
-
-
|
| 46 |
-
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
`C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]`).
|
| 50 |
-
- **Example input file:** `examples/input/examples_molecules.tsv`
|
| 51 |
-
|
| 52 |
|
| 53 |
### Model
|
| 54 |
|
| 55 |
-
-
|
| 56 |
-
-
|
| 57 |
-
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
growth of a given bacterial strain.
|
| 65 |
-
- **Training data:** 100000 randomly sampled compounds from ChemBERTa for the pretraining of MolE and data from *Maier et al., 2018* containing the influence of 1197 marketed drugs on the growth of 40 bacterial strains for the XGBoost classifier
|
| 66 |
-
- **Publication:** [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4)
|
| 67 |
|
| 68 |
### Output
|
| 69 |
|
| 70 |
-
-
|
| 71 |
-
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
- `antimicrobial_predictive_probability` *(float)*: Predicted
|
| 78 |
-
probability that the compound inhibits microbial growth.
|
| 79 |
-
- **Example output file:**
|
| 80 |
-
`examples/output/example_molecules_prediction.tsv`
|
| 81 |
|
| 82 |
## Installation
|
| 83 |
|
|
|
|
| 32 |
|
| 33 |
## Long description
|
| 34 |
|
| 35 |
+
MolE integrates molecular graph-based representation learning with gradient-boosted decision trees for predicting antimicrobial potential. The approach involves:
|
| 36 |
+
1. **Representation learning:** A graph neural network (GINet) trained on 100,000 randomly sampled compounds to derive
|
| 37 |
+
molecular embeddings from SMILES strings.
|
| 38 |
+
2. **Prediction:** These embeddings are used as input to an **XGBoost** model that predicts antimicrobial activity scores across 40 bacterial strains, based on data from *Maier et al., 2018*. The model was developed by **Roberto Olayo Alarcon et al.**. Further information is available in the [paper](https://www.nature.com/articles/s41467-025-58804-4).
|
| 39 |
|
| 40 |
## Metadata
|
| 41 |
|
| 42 |
### Input
|
| 43 |
|
| 44 |
+
- **Description:** Table of chemicals with their SMILES representations.
|
| 45 |
+
- **Shape:** `[n, 2]`
|
| 46 |
+
- **Data format:**
|
| 47 |
+
- `chem_name` *(str)*: Name of the molecule (e.g., *Halicin*).
|
| 48 |
+
- `smiles` *(str)*: SMILES representation of the molecule (e.g.,`C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]`).
|
| 49 |
+
- **Example input file:** `examples/input/examples_molecules.tsv`
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
### Model
|
| 52 |
|
| 53 |
+
- **Modality:** Probability of microbial growth inhibition.
|
| 54 |
+
- **Scale:** Combination of bacterial strain and compound.
|
| 55 |
+
- **Description:**
|
| 56 |
+
- The model computes antimicrobial predictive probabilities for **40 bacterial strains** contained in [Maier et al., 2018](https://www.nature.com/articles/nature25979), using a two-step process:
|
| 57 |
+
1. Generate molecular embeddings with a pre-trained **GINet** representation model (`model.pth`, `config.yaml`).
|
| 58 |
+
2. Predict antimicrobial properties with a **trained XGBoost** classifier (`MolE-XGBoost-08.03.2024_14.20.pkl`).
|
| 59 |
+
- The scores reflect the likelihood that a compound inhibits the growth of a given bacterial strain.
|
| 60 |
+
- **Training data:** 100000 randomly sampled compounds from ChemBERTa for the pretraining of MolE and data from *Maier et al., 2018* containing the influence of 1197 marketed drugs on the growth of 40 bacterial strains for the XGBoost classifier
|
| 61 |
+
- **Publication:** [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4)
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
### Output
|
| 64 |
|
| 65 |
+
- **Description:** For each compound, the model predicts growth inhibition scores for 40 different bacterial strains.
|
| 66 |
+
- **Output format:** table
|
| 67 |
+
- **Shape:** `[n × 40, 2]`
|
| 68 |
+
- **Columns:**
|
| 69 |
+
- `pred_id` *(str)*: Combination of the molecule name and bacterial strain (e.g., `Halicin:Akkermansia muciniphila (NT5021)`).
|
| 70 |
+
- `antimicrobial_predictive_probability` *(float)*: Predicted probability that the compound inhibits microbial growth.
|
| 71 |
+
- **Example output file:** `examples/output/example_molecules_prediction.tsv`
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
## Installation
|
| 74 |
|