pavm595 commited on
Commit
59fb564
·
verified ·
1 Parent(s): 6ca63b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -23
README.md CHANGED
@@ -22,9 +22,7 @@ language:
22
 
23
  ## Short description
24
 
25
- MolE learns task-independent molecular representations of chemicals vis Graph Isomorphism Networks (GINs). Combined with an XGBoost classifier it estimates the probability of a compound
26
- inhibiting bacterial growth. The model was developed by Roberto Olayo Alarcon et al. and more information can be found in the GitHub repository and the accompanying paper.
27
-
28
 
29
  ## Model versions
30
 
@@ -34,21 +32,12 @@ inhibiting bacterial growth. The model was developed by Roberto Olayo Alarcon et
34
  Repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>\
35
  Hugging Face Hub: `virtual-human-chc/MolE`
36
 
37
-
38
  ## Long description
39
 
40
- MolE integrates molecular graph-based representation learning with
41
- gradient-boosted decision trees for predicting antimicrobial potential.
42
- The approach involves: 1. **Representation learning:** A graph neural
43
- network (GINet) trained on 100,000 randomly sampled compounds to derive
44
- molecular embeddings from SMILES strings. 2. **Prediction:** These
45
- embeddings are used as input to an **XGBoost** model that predicts
46
- antimicrobial activity scores across 40 bacterial strains, based on data from *Maier et al., 2018*.
47
-
48
- The model was developed by **Roberto Olayo Alarcon et al.**.
49
  Further information is available in the [paper](https://www.nature.com/articles/s41467-025-58804-4).
50
 
51
-
52
  ## Metadata
53
 
54
  ### Input
@@ -78,7 +67,6 @@ Further information is available in the [paper](https://www.nature.com/articles/
78
  - **Training data:** 100000 randomly sampled compounds from ChemBERTa for the pretraining of MolE and data from *Maier et al., 2018* containing the influence of 1197 marketed drugs on the growth of 40 bacterial strains for the XGBoost classifier
79
  - **Publication:** [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4)
80
 
81
-
82
  ### Output
83
 
84
  - **Description:** For each compound, the model predicts growth inhibition scores for 40 different bacterial strains.
@@ -93,7 +81,6 @@ Further information is available in the [paper](https://www.nature.com/articles/
93
  - **Example output file:**
94
  `examples/output/example_molecules_prediction.tsv`
95
 
96
-
97
  ## Installation
98
 
99
  Install the conda environment with all dependencies:
@@ -147,7 +134,6 @@ pred = mole.predict_from_smiles("examples/input/examples_molecules.tsv")
147
  print(pred)
148
  ```
149
 
150
-
151
  ## References
152
 
153
  1. Roberto Olayo Alarcon et al., *MolE: Graph-based molecular representation learning for antimicrobial discovery*, [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4).
@@ -156,14 +142,9 @@ print(pred)
156
  4. GitHub repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>.
157
  5. Model weights (Zenodo DOI): <https://doi.org/10.5281/zenodo.10803099>.
158
 
159
-
160
  ## Copyright
161
 
162
- Code derived from <https://github.com/rolayoalarcon/MolE> is licensed
163
- under the **MIT License**, © 2024 Roberto Olayo Alarcon.
164
- Model weights are licensed under **Creative Commons Attribution 4.0
165
- International (CC BY 4.0)**, © 2024 Roberto Olayo Alarcon.
166
- Additional code © 2025 Maksim Pavlov, licensed under MIT.
167
 
168
  <!-- # MolE - Antimicrobial Prediction
169
 
 
22
 
23
  ## Short description
24
 
25
+ MolE learns task-independent molecular representations of chemicals vis Graph Isomorphism Networks (GINs). Combined with an XGBoost classifier it estimates the probability of a compound inhibiting bacterial growth. The model was developed by Roberto Olayo Alarcon et al. and more information can be found in the [GitHub repository](https://github.com/rolayoalarcon/MolE) and the [accompanying paper](https://www.nature.com/articles/s41467-025-58804-4)GitHub repository and the accompanying paper.
 
 
26
 
27
  ## Model versions
28
 
 
32
  Repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>\
33
  Hugging Face Hub: `virtual-human-chc/MolE`
34
 
 
35
  ## Long description
36
 
37
+ MolE integrates molecular graph-based representation learning with gradient-boosted decision trees for predicting antimicrobial potential. The approach involves: 1. **Representation learning:** A graph neural network (GINet) trained on 100,000 randomly sampled compounds to derive
38
+ molecular embeddings from SMILES strings. 2. **Prediction:** These embeddings are used as input to an **XGBoost** model that predicts antimicrobial activity scores across 40 bacterial strains, based on data from *Maier et al., 2018*. The model was developed by **Roberto Olayo Alarcon et al.**.
 
 
 
 
 
 
 
39
  Further information is available in the [paper](https://www.nature.com/articles/s41467-025-58804-4).
40
 
 
41
  ## Metadata
42
 
43
  ### Input
 
67
  - **Training data:** 100000 randomly sampled compounds from ChemBERTa for the pretraining of MolE and data from *Maier et al., 2018* containing the influence of 1197 marketed drugs on the growth of 40 bacterial strains for the XGBoost classifier
68
  - **Publication:** [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4)
69
 
 
70
  ### Output
71
 
72
  - **Description:** For each compound, the model predicts growth inhibition scores for 40 different bacterial strains.
 
81
  - **Example output file:**
82
  `examples/output/example_molecules_prediction.tsv`
83
 
 
84
  ## Installation
85
 
86
  Install the conda environment with all dependencies:
 
134
  print(pred)
135
  ```
136
 
 
137
  ## References
138
 
139
  1. Roberto Olayo Alarcon et al., *MolE: Graph-based molecular representation learning for antimicrobial discovery*, [Nature Communications (2025)](https://www.nature.com/articles/s41467-025-58804-4).
 
142
  4. GitHub repository: <https://github.com/rolayoalarcon/mole_antimicrobial_potential>.
143
  5. Model weights (Zenodo DOI): <https://doi.org/10.5281/zenodo.10803099>.
144
 
 
145
  ## Copyright
146
 
147
+ Code derived from <https://github.com/rolayoalarcon/MolE> is licensed under the **MIT License**, © 2024 Roberto Olayo Alarcon. Model weights are licensed under **Creative Commons Attribution 4.0 International (CC BY 4.0)**, © 2024 Roberto Olayo Alarcon. Additional code © 2025 Maksim Pavlov, licensed under MIT.
 
 
 
 
148
 
149
  <!-- # MolE - Antimicrobial Prediction
150