pavm595 commited on
Commit
c807e0a
·
verified ·
1 Parent(s): bd0ecf6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -22
README.md CHANGED
@@ -1,22 +1,86 @@
1
- # MolE - Antimicrobial Prediction
2
-
3
- This model uses MolE's pre-trained representation to train XGBoost models to predict the antimicrobial activity of compounds based on their molecular structure.
4
-
5
- ## Files:
6
-
7
- - `model.pth` - the pre-trained representation model's weights
8
- - `config.yaml` - model configuration
9
- - `MolE-XGBoost-08.03.2024_14.20.pkl` - pretrained XGBoost model
10
-
11
- ## Usage
12
-
13
- Not ready yet.
14
-
15
- ## Publication
16
- For more information about MolE, and how we use it to predict antimicrobial activity, you can check out the paper in Nature Communications:
17
- [**Pre-trained molecular representations enable antimicrobial discovery**](https://www.nature.com/articles/s41467-025-58804-4)
18
-
19
- ## GitHub
20
-
21
- The code is available here:
22
- [**Link to GitHub repo**](https://github.com/rolayoalarcon/mole_antimicrobial_potential)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - pytorch
4
+ - pyg
5
+ - graph-neural-networks
6
+ - machine-learning
7
+ - barlow-twins
8
+ - graph-isomorphism-network
9
+ - molecular-biology
10
+ - computational-biology
11
+ - antibiotics
12
+ - antimicrobial-discovery
13
+ - high-throughput-screening
14
+ - virtual-drug-screening
15
+ - haicu
16
+ library_name: pytorch
17
+ language:
18
+ - en
19
+ ---
20
+
21
+ # MolE - Antimicrobial Prediction
22
+
23
+ This model uses MolE's pre-trained representation to train XGBoost models to predict the antimicrobial activity of compounds based on their molecular structure. The model was developed by Roberto Olayo Alarcon et al. and more information can be found in the [GitHub repository](https://github.com/rolayoalarcon/MolE) and the [accompanying paper](https://www.nature.com/articles/s41467-025-58804-4).
24
+
25
+ ## Files:
26
+
27
+ - `model.pth` - the pre-trained representation model's weights
28
+ - `config.yaml` - model configuration
29
+ - `MolE-XGBoost-08.03.2024_14.20.pkl` - pretrained XGBoost model
30
+
31
+ ## Usage
32
+
33
+ ### Inference Example
34
+
35
+ Below is a minimal example showing how to load and run inference with **MolE** directly from the Hugging Face Hub.
36
+
37
+ ```python
38
+ import torch, yaml, pickle, pandas as pd
39
+ from huggingface_hub import hf_hub_download
40
+ import mole_representation, mole_antimicrobial_prediction
41
+
42
+ class MolE:
43
+ def __init__(self, device='auto'):
44
+ repo = "pavm595/MolE-antimicrobial"
45
+ self.device = "cuda:0" if device == "auto" and torch.cuda.is_available() else "cpu"
46
+
47
+ # Download + load
48
+ cfg = yaml.safe_load(open(hf_hub_download(repo, "config.yaml")))
49
+ self.model = mole_representation.GINet(**cfg["model"]).to(self.device)
50
+ self.model.load_state_dict(torch.load(hf_hub_download(repo, "model.pth"), map_location=self.device))
51
+ self.xgb = pickle.load(open(hf_hub_download(repo, "MolE-XGBoost-08.03.2024_14.20.pkl"), "rb"))
52
+
53
+ def predict_from_smiles(self, smiles_tsv):
54
+ smiles_df = mole_representation.read_smiles(smiles_tsv, "smiles", "chem_name")
55
+ emb = mole_representation.batch_representation(smiles_df, self.model, "smiles", "chem_name", device=self.device)
56
+ X_input = mole_antimicrobial_prediction.add_strains(
57
+ emb, "data/01.prepare_training_data/maier_screening_results.tsv.gz"
58
+ )
59
+ probs = self.xgb.predict_proba(X_input)[:, 1]
60
+ return pd.DataFrame(
61
+ {"antimicrobial_predictive_probability": probs},
62
+ index=X_input.index
63
+ )
64
+ ```
65
+ ### Run inference:
66
+
67
+ ```python
68
+ mole = MolE()
69
+ pred = mole.predict_from_smiles("examples/input/examples_molecules.tsv")
70
+ print(pred)
71
+ ```
72
+
73
+ ## Metadata
74
+
75
+ ### Input
76
+
77
+ The input is a TSV file with two columns: `chem_name` and `smiles`. The column 'chem_name' contains the name of the molecule from PubChem, e.g. Halicin, and the column 'smiles' contains the chemical formula in SMILES format, e.g. `C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]`. An example input is the file `examples\input\example_molecules.tsv`.
78
+
79
+ ### Output
80
+
81
+ The output is a TSV file with two columns: `pred_id` and `antimicrobial_predictive_probability`. The column `pred_id` contains a given molecule and a bacteria, e.g. Halicin:Akkermansia muciniphila (NT5021), and the column `antimicrobial_predictive_probability` contains antimicrobial potential (AP) scores for
82
+ molecule prioritization, reflecting the chance of the given molecule having growth inhibition effect on the corresponding bacteria, e.g. 0.021192694. An example output is `examples/output/example_molecules_prediction.tsv`.
83
+
84
+ ## Copyright
85
+
86
+ Code derived from https://github.com/rolayoalarcon/MolE is licensed under the MIT license, Copyright (c) 2024 Roberto Olayo Alarcon. The [model weights](https://doi.org/10.5281/zenodo.10803099) are licensed under [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/legalcode), Copyright (c) 2024 Roberto Olayo Alarcon. The other code is licensed under the MIT license, Copyright (c) 2025 Maksim Pavlov.