Update README.md
Browse files
README.md
CHANGED
|
@@ -22,9 +22,7 @@ language:
|
|
| 22 |
|
| 23 |
## Short description
|
| 24 |
|
| 25 |
-
MolE learns task-independent molecular representations of chemicals vis Graph Isomorphism Networks (GINs). Combined with an XGBoost classifier it estimates the probability of a compound
|
| 26 |
-
inhibiting bacterial growth. The model was developed by Roberto Olayo Alarcon et al. and more information can be found in the GitHub repository and the accompanying paper.
|
| 27 |
-
|
| 28 |
|
| 29 |
## Model versions
|
| 30 |
|
|
@@ -34,21 +32,12 @@ inhibiting bacterial growth. The model was developed by Roberto Olayo Alarcon et
|
|
| 34 |
Repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>\
|
| 35 |
Hugging Face Hub: `virtual-human-chc/MolE`
|
| 36 |
|
| 37 |
-
|
| 38 |
## Long description
|
| 39 |
|
| 40 |
-
MolE integrates molecular graph-based representation learning with
|
| 41 |
-
|
| 42 |
-
The approach involves: 1. **Representation learning:** A graph neural
|
| 43 |
-
network (GINet) trained on 100,000 randomly sampled compounds to derive
|
| 44 |
-
molecular embeddings from SMILES strings. 2. **Prediction:** These
|
| 45 |
-
embeddings are used as input to an **XGBoost** model that predicts
|
| 46 |
-
antimicrobial activity scores across 40 bacterial strains, based on data from *Maier et al., 2018*.
|
| 47 |
-
|
| 48 |
-
The model was developed by **Roberto Olayo Alarcon et al.**.
|
| 49 |
Further information is available in the [paper](https://www.nature.com/articles/s41467-025-58804-4).
|
| 50 |
|
| 51 |
-
|
| 52 |
## Metadata
|
| 53 |
|
| 54 |
### Input
|
|
@@ -78,7 +67,6 @@ Further information is available in the [paper](https://www.nature.com/articles/
|
|
| 78 |
- **Training data:** 100000 randomly sampled compounds from ChemBERTa for the pretraining of MolE and data from *Maier et al., 2018* containing the influence of 1197 marketed drugs on the growth of 40 bacterial strains for the XGBoost classifier
|
| 79 |
- **Publication:** [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4)
|
| 80 |
|
| 81 |
-
|
| 82 |
### Output
|
| 83 |
|
| 84 |
- **Description:** For each compound, the model predicts growth inhibition scores for 40 different bacterial strains.
|
|
@@ -93,7 +81,6 @@ Further information is available in the [paper](https://www.nature.com/articles/
|
|
| 93 |
- **Example output file:**
|
| 94 |
`examples/output/example_molecules_prediction.tsv`
|
| 95 |
|
| 96 |
-
|
| 97 |
## Installation
|
| 98 |
|
| 99 |
Install the conda environment with all dependencies:
|
|
@@ -147,7 +134,6 @@ pred = mole.predict_from_smiles("examples/input/examples_molecules.tsv")
|
|
| 147 |
print(pred)
|
| 148 |
```
|
| 149 |
|
| 150 |
-
|
| 151 |
## References
|
| 152 |
|
| 153 |
1. Roberto Olayo Alarcon et al., *MolE: Graph-based molecular representation learning for antimicrobial discovery*, [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4).
|
|
@@ -156,14 +142,9 @@ print(pred)
|
|
| 156 |
4. GitHub repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>.
|
| 157 |
5. Model weights (Zenodo DOI): <https://doi.org/10.5281/zenodo.10803099>.
|
| 158 |
|
| 159 |
-
|
| 160 |
## Copyright
|
| 161 |
|
| 162 |
-
Code derived from <https://github.com/rolayoalarcon/MolE> is licensed
|
| 163 |
-
under the **MIT License**, © 2024 Roberto Olayo Alarcon.
|
| 164 |
-
Model weights are licensed under **Creative Commons Attribution 4.0
|
| 165 |
-
International (CC BY 4.0)**, © 2024 Roberto Olayo Alarcon.
|
| 166 |
-
Additional code © 2025 Maksim Pavlov, licensed under MIT.
|
| 167 |
|
| 168 |
<!-- # MolE - Antimicrobial Prediction
|
| 169 |
|
|
|
|
| 22 |
|
| 23 |
## Short description
|
| 24 |
|
| 25 |
+
MolE learns task-independent molecular representations of chemicals vis Graph Isomorphism Networks (GINs). Combined with an XGBoost classifier it estimates the probability of a compound inhibiting bacterial growth. The model was developed by Roberto Olayo Alarcon et al. and more information can be found in the [GitHub repository](https://github.com/rolayoalarcon/MolE) and the [accompanying paper](https://www.nature.com/articles/s41467-025-58804-4)GitHub repository and the accompanying paper.
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Model versions
|
| 28 |
|
|
|
|
| 32 |
Repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>\
|
| 33 |
Hugging Face Hub: `virtual-human-chc/MolE`
|
| 34 |
|
|
|
|
| 35 |
## Long description
|
| 36 |
|
| 37 |
+
MolE integrates molecular graph-based representation learning with gradient-boosted decision trees for predicting antimicrobial potential. The approach involves: 1. **Representation learning:** A graph neural network (GINet) trained on 100,000 randomly sampled compounds to derive
|
| 38 |
+
molecular embeddings from SMILES strings. 2. **Prediction:** These embeddings are used as input to an **XGBoost** model that predicts antimicrobial activity scores across 40 bacterial strains, based on data from *Maier et al., 2018*. The model was developed by **Roberto Olayo Alarcon et al.**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
Further information is available in the [paper](https://www.nature.com/articles/s41467-025-58804-4).
|
| 40 |
|
|
|
|
| 41 |
## Metadata
|
| 42 |
|
| 43 |
### Input
|
|
|
|
| 67 |
- **Training data:** 100000 randomly sampled compounds from ChemBERTa for the pretraining of MolE and data from *Maier et al., 2018* containing the influence of 1197 marketed drugs on the growth of 40 bacterial strains for the XGBoost classifier
|
| 68 |
- **Publication:** [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4)
|
| 69 |
|
|
|
|
| 70 |
### Output
|
| 71 |
|
| 72 |
- **Description:** For each compound, the model predicts growth inhibition scores for 40 different bacterial strains.
|
|
|
|
| 81 |
- **Example output file:**
|
| 82 |
`examples/output/example_molecules_prediction.tsv`
|
| 83 |
|
|
|
|
| 84 |
## Installation
|
| 85 |
|
| 86 |
Install the conda environment with all dependencies:
|
|
|
|
| 134 |
print(pred)
|
| 135 |
```
|
| 136 |
|
|
|
|
| 137 |
## References
|
| 138 |
|
| 139 |
1. Roberto Olayo Alarcon et al., *MolE: Graph-based molecular representation learning for antimicrobial discovery*, [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4).
|
|
|
|
| 142 |
4. GitHub repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>.
|
| 143 |
5. Model weights (Zenodo DOI): <https://doi.org/10.5281/zenodo.10803099>.
|
| 144 |
|
|
|
|
| 145 |
## Copyright
|
| 146 |
|
| 147 |
+
Code derived from <https://github.com/rolayoalarcon/MolE> is licensed under the **MIT License**, © 2024 Roberto Olayo Alarcon. Model weights are licensed under **Creative Commons Attribution 4.0 International (CC BY 4.0)**, © 2024 Roberto Olayo Alarcon. Additional code © 2025 Maksim Pavlov, licensed under MIT.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
|
| 149 |
<!-- # MolE - Antimicrobial Prediction
|
| 150 |
|