Upload folder using huggingface_hub
Browse files- README.md +140 -3
- data-config.json +17 -0
- data-load-args.json +10 -0
- eval-metrics_test.json +5 -0
- eval-metrics_train.json +5 -0
- eval-metrics_validation.json +5 -0
- input-data.hf/data-00000-of-00001.arrow +3 -0
- input-data.hf/dataset_info.json +52 -0
- input-data.hf/state.json +13 -0
- logs-csv/lightning_logs/version_0/hparams.yaml +1 -0
- logs-csv/lightning_logs/version_0/metrics.csv +85 -0
- logs/lightning_logs/version_0/events.out.tfevents.1743097756.cn020.3140491.0 +3 -0
- logs/lightning_logs/version_0/hparams.yaml +1 -0
- metrics.csv +4 -0
- modelbox-config.json +11 -0
- params.pt +3 -0
- predictions_test.csv.gz +3 -0
- predictions_train.csv.gz +3 -0
- predictions_validation.csv.gz +3 -0
- training-args.json +5 -0
- training-data.hf/cache-5e5f3860b4738024.arrow +3 -0
- training-data.hf/data-00000-of-00001.arrow +3 -0
- training-data.hf/dataset_info.json +126 -0
- training-data.hf/state.json +15 -0
- training-log.csv +43 -0
- training-log.png +0 -0
README.md
CHANGED
|
@@ -1,3 +1,140 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: tabular-regression
|
| 4 |
+
tags:
|
| 5 |
+
- chemistry
|
| 6 |
+
- microbiology
|
| 7 |
+
- antibiotics
|
| 8 |
+
library_name: duvida
|
| 9 |
+
datasets:
|
| 10 |
+
- scbirlab/thomas-2018-spark-wt
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Predictor of _Staphylococcus aureus_ MICs
|
| 14 |
+
|
| 15 |
+
_Updated:_ Fri 28 Mar 14:27:41 GMT 2025
|
| 16 |
+
|
| 17 |
+
Trained on the _Staphylococcus aureus_, WT accumulator phenotype subset of the [human-curated SPARK dataset](https://doi.org/10.1021/acsinfecdis.8b00193) (2115 rows in total for _Staphylococcus aureus_).
|
| 18 |
+
|
| 19 |
+
## Model details
|
| 20 |
+
|
| 21 |
+
This model was trained using [our Duvida framework](https://github.com/scbirlab/duvida),
|
| 22 |
+
as a result of hyperparameter searches and selecting the model that performs best on unseen test data
|
| 23 |
+
(from a scaffold split).
|
| 24 |
+
|
| 25 |
+
Duvida also saves the training data in this checkpoint to allows the calculation of uncertainty metrics
|
| 26 |
+
based on that training data.
|
| 27 |
+
|
| 28 |
+
This model is the best regression model from a hyperparameter search, determined
|
| 29 |
+
by Spearman's $\rho$ on a held-out test set not used in training or early stopping.
|
| 30 |
+
|
| 31 |
+
### Model architecture
|
| 32 |
+
|
| 33 |
+
- **Regression**
|
| 34 |
+
|
| 35 |
+
```json
|
| 36 |
+
|
| 37 |
+
{
|
| 38 |
+
"dropout": 0.2,
|
| 39 |
+
"ensemble_size": 10,
|
| 40 |
+
"extra_featurizers": null,
|
| 41 |
+
"learning_rate": 0.0001,
|
| 42 |
+
"model_class": "ChempropModelBox",
|
| 43 |
+
"n_hidden": 1,
|
| 44 |
+
"n_units": 16,
|
| 45 |
+
"use_2d": false,
|
| 46 |
+
"use_fp": true
|
| 47 |
+
}
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
### Model usage
|
| 51 |
+
|
| 52 |
+
You can use this model with:
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
from duvida.autoclasses import AutoModelBox
|
| 56 |
+
modelbox = AutoModelBox.from_pretrained("hf://scbirlab/spark-dv-2503-saur")
|
| 57 |
+
modelbox.predict(filename=..., inputs=[...], columns=[...]) # make predictions on your own data
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## Training details
|
| 61 |
+
|
| 62 |
+
- **Dataset:** [SPARK, WT accumulator, _Staphylococcus aureus_ subset](https://huggingface.co/datasets/scbirlab/thomas-2018-spark-wt)
|
| 63 |
+
- **Input column:** smiles
|
| 64 |
+
- **Output column:** pmic
|
| 65 |
+
- **Split type:** Murcko scaffold
|
| 66 |
+
- **Split proportions:**
|
| 67 |
+
- 70% training (1424 rows)
|
| 68 |
+
- 15% validation (for early stopping) (309 rows)
|
| 69 |
+
- 15% test (for selecting hyperparameters) (316 rows)
|
| 70 |
+
|
| 71 |
+
Here is the training log:
|
| 72 |
+
|
| 73 |
+
<img src="training-log.png" width=450>
|
| 74 |
+
|
| 75 |
+
And these are the evaluation scores.
|
| 76 |
+
|
| 77 |
+
Train (1424 rows):
|
| 78 |
+
|
| 79 |
+
```json
|
| 80 |
+
|
| 81 |
+
{
|
| 82 |
+
"Pearson r": 0.8988537726404331,
|
| 83 |
+
"RMSE": 0.26244109869003296,
|
| 84 |
+
"Spearman rho": 0.7846286417647882
|
| 85 |
+
}
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
Validation (309 rows):
|
| 89 |
+
|
| 90 |
+
```json
|
| 91 |
+
|
| 92 |
+
{
|
| 93 |
+
"Pearson r": 0.9436229039637082,
|
| 94 |
+
"RMSE": 0.3413218855857849,
|
| 95 |
+
"Spearman rho": 0.8756560069371535
|
| 96 |
+
}
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
Test (316 rows):
|
| 101 |
+
|
| 102 |
+
```json
|
| 103 |
+
|
| 104 |
+
{
|
| 105 |
+
"Pearson r": 0.7476855201027743,
|
| 106 |
+
"RMSE": 0.7906074523925781,
|
| 107 |
+
"Spearman rho": 0.8293761872496432
|
| 108 |
+
}
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Training data details
|
| 112 |
+
|
| 113 |
+
The training data were collated by the authors of:
|
| 114 |
+
|
| 115 |
+
> Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
| 116 |
+
> Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery
|
| 117 |
+
> ACS Infectious Diseases 2018 4 (11), 1536-1539
|
| 118 |
+
> DOI: 10.1021/acsinfecdis.8b00193
|
| 119 |
+
|
| 120 |
+
We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values,
|
| 121 |
+
give succint column titles, and split by species.
|
| 122 |
+
|
| 123 |
+
This particular dataset retains only measurements on bacteria with wild-type accumulation phenotypes.
|
| 124 |
+
|
| 125 |
+
### Dataset Sources
|
| 126 |
+
|
| 127 |
+
- **Repository:** https://www.collaborativedrug.com/spark-data-downloads
|
| 128 |
+
- **Paper:** https://doi.org/10.1021/acsinfecdis.8b00193
|
| 129 |
+
|
| 130 |
+
### Data Collection and Processing
|
| 131 |
+
|
| 132 |
+
Data were processed using [schemist](https://github.com/scbirlab/schemist), a tool for processing chemical datasets.
|
| 133 |
+
|
| 134 |
+
The SMILES strings have been canonicalized, and split into training (70%), validation (15%), and test (15%) sets
|
| 135 |
+
by Murcko scaffold for each species with more than 1000 entries. Additional features like molecular weight and
|
| 136 |
+
topological polar surface area have also been calculated.
|
| 137 |
+
|
| 138 |
+
### Who are the source data producers?
|
| 139 |
+
|
| 140 |
+
Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
|
data-config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_default_cache": "cache/duvida/data",
|
| 3 |
+
"_in_key": "inputs",
|
| 4 |
+
"_input_cols": [
|
| 5 |
+
"smiles"
|
| 6 |
+
],
|
| 7 |
+
"_label_cols": [
|
| 8 |
+
"pmic"
|
| 9 |
+
],
|
| 10 |
+
"_out_key": "labels",
|
| 11 |
+
"input_shape": [
|
| 12 |
+
2048
|
| 13 |
+
],
|
| 14 |
+
"output_shape": [
|
| 15 |
+
1
|
| 16 |
+
]
|
| 17 |
+
}
|
data-load-args.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cache": "/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache",
|
| 3 |
+
"features": [
|
| 4 |
+
"smiles"
|
| 5 |
+
],
|
| 6 |
+
"filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz",
|
| 7 |
+
"labels": [
|
| 8 |
+
"pmic"
|
| 9 |
+
]
|
| 10 |
+
}
|
eval-metrics_test.json
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"Pearson r": 0.7476855201027743,
|
| 3 |
+
"RMSE": 0.7906074523925781,
|
| 4 |
+
"Spearman rho": 0.8293761872496432
|
| 5 |
+
}
|
eval-metrics_train.json
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"Pearson r": 0.8988537726404331,
|
| 3 |
+
"RMSE": 0.26244109869003296,
|
| 4 |
+
"Spearman rho": 0.7846286417647882
|
| 5 |
+
}
|
eval-metrics_validation.json
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"Pearson r": 0.9436229039637082,
|
| 3 |
+
"RMSE": 0.3413218855857849,
|
| 4 |
+
"Spearman rho": 0.8756560069371535
|
| 5 |
+
}
|
input-data.hf/data-00000-of-00001.arrow
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:900da3bf86034317afa2c12fc68b300aedad60e84d41775622b7d1c5f53e85c2
|
| 3 |
+
size 201832
|
input-data.hf/dataset_info.json
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"builder_name": "csv",
|
| 3 |
+
"citation": "",
|
| 4 |
+
"config_name": "default",
|
| 5 |
+
"dataset_name": "csv",
|
| 6 |
+
"dataset_size": 547635,
|
| 7 |
+
"description": "",
|
| 8 |
+
"download_checksums": {
|
| 9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz": {
|
| 10 |
+
"num_bytes": 89427,
|
| 11 |
+
"checksum": null
|
| 12 |
+
}
|
| 13 |
+
},
|
| 14 |
+
"download_size": 89427,
|
| 15 |
+
"features": {
|
| 16 |
+
"smiles": {
|
| 17 |
+
"dtype": "string",
|
| 18 |
+
"_type": "Value"
|
| 19 |
+
},
|
| 20 |
+
"inputs": {
|
| 21 |
+
"feature": {
|
| 22 |
+
"dtype": "string",
|
| 23 |
+
"_type": "Value"
|
| 24 |
+
},
|
| 25 |
+
"_type": "Sequence"
|
| 26 |
+
},
|
| 27 |
+
"labels": {
|
| 28 |
+
"feature": {
|
| 29 |
+
"dtype": "float64",
|
| 30 |
+
"_type": "Value"
|
| 31 |
+
},
|
| 32 |
+
"_type": "Sequence"
|
| 33 |
+
}
|
| 34 |
+
},
|
| 35 |
+
"homepage": "",
|
| 36 |
+
"license": "",
|
| 37 |
+
"size_in_bytes": 637062,
|
| 38 |
+
"splits": {
|
| 39 |
+
"train": {
|
| 40 |
+
"name": "train",
|
| 41 |
+
"num_bytes": 547635,
|
| 42 |
+
"num_examples": 1424,
|
| 43 |
+
"dataset_name": "csv"
|
| 44 |
+
}
|
| 45 |
+
},
|
| 46 |
+
"version": {
|
| 47 |
+
"version_str": "0.0.0",
|
| 48 |
+
"major": 0,
|
| 49 |
+
"minor": 0,
|
| 50 |
+
"patch": 0
|
| 51 |
+
}
|
| 52 |
+
}
|
input-data.hf/state.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_data_files": [
|
| 3 |
+
{
|
| 4 |
+
"filename": "data-00000-of-00001.arrow"
|
| 5 |
+
}
|
| 6 |
+
],
|
| 7 |
+
"_fingerprint": "c23b3d599a50a0d3",
|
| 8 |
+
"_format_columns": null,
|
| 9 |
+
"_format_kwargs": {},
|
| 10 |
+
"_format_type": null,
|
| 11 |
+
"_output_all_columns": false,
|
| 12 |
+
"_split": "train"
|
| 13 |
+
}
|
logs-csv/lightning_logs/version_0/hparams.yaml
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{}
|
logs-csv/lightning_logs/version_0/metrics.csv
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
epoch,loss,step,val_loss
|
| 2 |
+
0,,88,4.601920127868652
|
| 3 |
+
0,7.492936611175537,88,
|
| 4 |
+
1,,177,4.012445449829102
|
| 5 |
+
1,1.5991687774658203,177,
|
| 6 |
+
2,,266,3.6562659740448
|
| 7 |
+
2,1.41478431224823,266,
|
| 8 |
+
3,,355,2.806457996368408
|
| 9 |
+
3,1.3028432130813599,355,
|
| 10 |
+
4,,444,2.421854257583618
|
| 11 |
+
4,1.1816277503967285,444,
|
| 12 |
+
5,,533,2.188049554824829
|
| 13 |
+
5,1.0972480773925781,533,
|
| 14 |
+
6,,622,1.5965540409088135
|
| 15 |
+
6,1.025787591934204,622,
|
| 16 |
+
7,,711,1.5013176202774048
|
| 17 |
+
7,0.9804941415786743,711,
|
| 18 |
+
8,,800,1.1018073558807373
|
| 19 |
+
8,0.9172889590263367,800,
|
| 20 |
+
9,,889,1.0003383159637451
|
| 21 |
+
9,0.8965839743614197,889,
|
| 22 |
+
10,,978,0.8400489687919617
|
| 23 |
+
10,0.8749857544898987,978,
|
| 24 |
+
11,,1067,0.8077839612960815
|
| 25 |
+
11,0.8283694982528687,1067,
|
| 26 |
+
12,,1156,0.6920057535171509
|
| 27 |
+
12,0.7916938662528992,1156,
|
| 28 |
+
13,,1245,0.7309020161628723
|
| 29 |
+
13,0.7782658934593201,1245,
|
| 30 |
+
14,,1334,0.5796863436698914
|
| 31 |
+
14,0.7399182319641113,1334,
|
| 32 |
+
15,,1423,0.44175010919570923
|
| 33 |
+
15,0.6986851096153259,1423,
|
| 34 |
+
16,,1512,0.47962722182273865
|
| 35 |
+
16,0.6836443543434143,1512,
|
| 36 |
+
17,,1601,0.4024624228477478
|
| 37 |
+
17,0.6603368520736694,1601,
|
| 38 |
+
18,,1690,0.38161563873291016
|
| 39 |
+
18,0.6568854451179504,1690,
|
| 40 |
+
19,,1779,0.351207435131073
|
| 41 |
+
19,0.6209790706634521,1779,
|
| 42 |
+
20,,1868,0.3669344484806061
|
| 43 |
+
20,0.5972228050231934,1868,
|
| 44 |
+
21,,1957,0.3335961699485779
|
| 45 |
+
21,0.598533570766449,1957,
|
| 46 |
+
22,,2046,0.35952678322792053
|
| 47 |
+
22,0.5858601927757263,2046,
|
| 48 |
+
23,,2135,0.33657750487327576
|
| 49 |
+
23,0.5555119514465332,2135,
|
| 50 |
+
24,,2224,0.338556706905365
|
| 51 |
+
24,0.5531841516494751,2224,
|
| 52 |
+
25,,2313,0.3455941379070282
|
| 53 |
+
25,0.5307201147079468,2313,
|
| 54 |
+
26,,2402,0.3448736071586609
|
| 55 |
+
26,0.5229456424713135,2402,
|
| 56 |
+
27,,2491,0.3512353301048279
|
| 57 |
+
27,0.49847835302352905,2491,
|
| 58 |
+
28,,2580,0.35296955704689026
|
| 59 |
+
28,0.4923514425754547,2580,
|
| 60 |
+
29,,2669,0.3565228581428528
|
| 61 |
+
29,0.501798152923584,2669,
|
| 62 |
+
30,,2758,0.3635339140892029
|
| 63 |
+
30,0.4853680729866028,2758,
|
| 64 |
+
31,,2847,0.3745820224285126
|
| 65 |
+
31,0.4735918343067169,2847,
|
| 66 |
+
32,,2936,0.3685125708580017
|
| 67 |
+
32,0.4744315445423126,2936,
|
| 68 |
+
33,,3025,0.37449511885643005
|
| 69 |
+
33,0.4655072093009949,3025,
|
| 70 |
+
34,,3114,0.40137070417404175
|
| 71 |
+
34,0.4816225469112396,3114,
|
| 72 |
+
35,,3203,0.3752176761627197
|
| 73 |
+
35,0.4698217213153839,3203,
|
| 74 |
+
36,,3292,0.38168954849243164
|
| 75 |
+
36,0.45470479130744934,3292,
|
| 76 |
+
37,,3381,0.38085946440696716
|
| 77 |
+
37,0.44848036766052246,3381,
|
| 78 |
+
38,,3470,0.3908793032169342
|
| 79 |
+
38,0.4452821612358093,3470,
|
| 80 |
+
39,,3559,0.3829881548881531
|
| 81 |
+
39,0.4430929124355316,3559,
|
| 82 |
+
40,,3648,0.38418999314308167
|
| 83 |
+
40,0.4391266405582428,3648,
|
| 84 |
+
41,,3737,0.38192063570022583
|
| 85 |
+
41,0.4466865658760071,3737,
|
logs/lightning_logs/version_0/events.out.tfevents.1743097756.cn020.3140491.0
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4f0a9d09df8a21cc42ef0056f12542d37b8b4ffa3e297477a28c26ee74a4445f
|
| 3 |
+
size 7560
|
logs/lightning_logs/version_0/hparams.yaml
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{}
|
metrics.csv
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
split,split_filename,config_i,model_class,n_parameters,filename,features,labels,cache,extra_featurizers,use_2d,use_fp,dropout,ensemble_size,learning_rate,n_hidden,n_units,val_filename,epochs,batch_size,RMSE,Pearson r,Spearman rho
|
| 2 |
+
train,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,40,ChempropModelBox,2653010,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache,,False,True,0.2,10,0.0001,1,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,2000,16,0.26244109869003296,0.8988537726404331,0.7846286417647882
|
| 3 |
+
validation,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,40,ChempropModelBox,2653010,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache,,False,True,0.2,10,0.0001,1,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,2000,16,0.3413218855857849,0.9436229039637082,0.8756560069371535
|
| 4 |
+
test,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-test.csv.gz,40,ChempropModelBox,2653010,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache,,False,True,0.2,10,0.0001,1,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,2000,16,0.7906074523925781,0.7476855201027743,0.8293761872496432
|
modelbox-config.json
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"dropout": 0.2,
|
| 3 |
+
"ensemble_size": 10,
|
| 4 |
+
"extra_featurizers": null,
|
| 5 |
+
"learning_rate": 0.0001,
|
| 6 |
+
"model_class": "ChempropModelBox",
|
| 7 |
+
"n_hidden": 1,
|
| 8 |
+
"n_units": 16,
|
| 9 |
+
"use_2d": false,
|
| 10 |
+
"use_fp": true
|
| 11 |
+
}
|
params.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:736b9d735e4d4c5e1397edee5768bacbca0f5b0e42893aa34208c4323ca64bf3
|
| 3 |
+
size 10696484
|
predictions_test.csv.gz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a9acde7d7a0bb5b152ac795d43830589811b908ed646b5fdf2b8466524217b5d
|
| 3 |
+
size 120405
|
predictions_train.csv.gz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cf2f6fb9ac3ebdae0a7d061bfa9d6b306144b9c8f9aa8fcc2d1a8e721360f119
|
| 3 |
+
size 423363
|
predictions_validation.csv.gz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7cb0dfa5badc93dcc9080d96c1367b6a54a77c20ae6618c665f5782ca6f3b409
|
| 3 |
+
size 111443
|
training-args.json
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"batch_size": 16,
|
| 3 |
+
"epochs": 2000,
|
| 4 |
+
"val_filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz"
|
| 5 |
+
}
|
training-data.hf/cache-5e5f3860b4738024.arrow
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c30c7291475d48b6adfa4a3184562593468c7eda33ae3a89ef91f54907695a6a
|
| 3 |
+
size 42076408
|
training-data.hf/data-00000-of-00001.arrow
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6b7ae972edc8c8b602b7dd444ead34dabd1be1a54538e8cee54a59f1eb395d18
|
| 3 |
+
size 41882032
|
training-data.hf/dataset_info.json
ADDED
|
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"builder_name": "csv",
|
| 3 |
+
"citation": "",
|
| 4 |
+
"config_name": "default",
|
| 5 |
+
"dataset_name": "csv",
|
| 6 |
+
"dataset_size": 547635,
|
| 7 |
+
"description": "",
|
| 8 |
+
"download_checksums": {
|
| 9 |
+
"/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz": {
|
| 10 |
+
"num_bytes": 89427,
|
| 11 |
+
"checksum": null
|
| 12 |
+
}
|
| 13 |
+
},
|
| 14 |
+
"download_size": 89427,
|
| 15 |
+
"features": {
|
| 16 |
+
"smiles": {
|
| 17 |
+
"feature": {
|
| 18 |
+
"dtype": "string",
|
| 19 |
+
"_type": "Value"
|
| 20 |
+
},
|
| 21 |
+
"_type": "Sequence"
|
| 22 |
+
},
|
| 23 |
+
"inputs": {
|
| 24 |
+
"V_d": {
|
| 25 |
+
"dtype": "null",
|
| 26 |
+
"_type": "Value"
|
| 27 |
+
},
|
| 28 |
+
"gt_mask": {
|
| 29 |
+
"dtype": "null",
|
| 30 |
+
"_type": "Value"
|
| 31 |
+
},
|
| 32 |
+
"lt_mask": {
|
| 33 |
+
"dtype": "null",
|
| 34 |
+
"_type": "Value"
|
| 35 |
+
},
|
| 36 |
+
"mg": {
|
| 37 |
+
"E": {
|
| 38 |
+
"feature": {
|
| 39 |
+
"feature": {
|
| 40 |
+
"dtype": "float32",
|
| 41 |
+
"_type": "Value"
|
| 42 |
+
},
|
| 43 |
+
"_type": "Sequence"
|
| 44 |
+
},
|
| 45 |
+
"_type": "Sequence"
|
| 46 |
+
},
|
| 47 |
+
"V": {
|
| 48 |
+
"feature": {
|
| 49 |
+
"feature": {
|
| 50 |
+
"dtype": "float32",
|
| 51 |
+
"_type": "Value"
|
| 52 |
+
},
|
| 53 |
+
"_type": "Sequence"
|
| 54 |
+
},
|
| 55 |
+
"_type": "Sequence"
|
| 56 |
+
},
|
| 57 |
+
"edge_index": {
|
| 58 |
+
"feature": {
|
| 59 |
+
"feature": {
|
| 60 |
+
"dtype": "float32",
|
| 61 |
+
"_type": "Value"
|
| 62 |
+
},
|
| 63 |
+
"_type": "Sequence"
|
| 64 |
+
},
|
| 65 |
+
"_type": "Sequence"
|
| 66 |
+
},
|
| 67 |
+
"rev_edge_index": {
|
| 68 |
+
"feature": {
|
| 69 |
+
"dtype": "float32",
|
| 70 |
+
"_type": "Value"
|
| 71 |
+
},
|
| 72 |
+
"_type": "Sequence"
|
| 73 |
+
}
|
| 74 |
+
},
|
| 75 |
+
"weight": {
|
| 76 |
+
"dtype": "float32",
|
| 77 |
+
"_type": "Value"
|
| 78 |
+
},
|
| 79 |
+
"x_d": {
|
| 80 |
+
"feature": {
|
| 81 |
+
"dtype": "float32",
|
| 82 |
+
"_type": "Value"
|
| 83 |
+
},
|
| 84 |
+
"_type": "Sequence"
|
| 85 |
+
},
|
| 86 |
+
"y": {
|
| 87 |
+
"feature": {
|
| 88 |
+
"dtype": "float32",
|
| 89 |
+
"_type": "Value"
|
| 90 |
+
},
|
| 91 |
+
"_type": "Sequence"
|
| 92 |
+
}
|
| 93 |
+
},
|
| 94 |
+
"labels": {
|
| 95 |
+
"feature": {
|
| 96 |
+
"dtype": "float64",
|
| 97 |
+
"_type": "Value"
|
| 98 |
+
},
|
| 99 |
+
"_type": "Sequence"
|
| 100 |
+
},
|
| 101 |
+
"extra_features": {
|
| 102 |
+
"feature": {
|
| 103 |
+
"dtype": "float32",
|
| 104 |
+
"_type": "Value"
|
| 105 |
+
},
|
| 106 |
+
"_type": "Sequence"
|
| 107 |
+
}
|
| 108 |
+
},
|
| 109 |
+
"homepage": "",
|
| 110 |
+
"license": "",
|
| 111 |
+
"size_in_bytes": 637062,
|
| 112 |
+
"splits": {
|
| 113 |
+
"train": {
|
| 114 |
+
"name": "train",
|
| 115 |
+
"num_bytes": 547635,
|
| 116 |
+
"num_examples": 1424,
|
| 117 |
+
"dataset_name": "csv"
|
| 118 |
+
}
|
| 119 |
+
},
|
| 120 |
+
"version": {
|
| 121 |
+
"version_str": "0.0.0",
|
| 122 |
+
"major": 0,
|
| 123 |
+
"minor": 0,
|
| 124 |
+
"patch": 0
|
| 125 |
+
}
|
| 126 |
+
}
|
training-data.hf/state.json
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_data_files": [
|
| 3 |
+
{
|
| 4 |
+
"filename": "data-00000-of-00001.arrow"
|
| 5 |
+
}
|
| 6 |
+
],
|
| 7 |
+
"_fingerprint": "5629eb6ca89ad40c",
|
| 8 |
+
"_format_columns": null,
|
| 9 |
+
"_format_kwargs": {
|
| 10 |
+
"dtype": "float"
|
| 11 |
+
},
|
| 12 |
+
"_format_type": "numpy",
|
| 13 |
+
"_output_all_columns": false,
|
| 14 |
+
"_split": "train"
|
| 15 |
+
}
|
training-log.csv
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
epoch,step,loss,val_loss
|
| 2 |
+
0,88,7.492936611175537,4.601920127868652
|
| 3 |
+
1,177,1.5991687774658203,4.012445449829102
|
| 4 |
+
2,266,1.41478431224823,3.6562659740448
|
| 5 |
+
3,355,1.30284321308136,2.806457996368408
|
| 6 |
+
4,444,1.1816277503967283,2.421854257583618
|
| 7 |
+
5,533,1.097248077392578,2.188049554824829
|
| 8 |
+
6,622,1.025787591934204,1.5965540409088137
|
| 9 |
+
7,711,0.9804941415786744,1.5013176202774048
|
| 10 |
+
8,800,0.9172889590263368,1.1018073558807373
|
| 11 |
+
9,889,0.8965839743614197,1.0003383159637451
|
| 12 |
+
10,978,0.8749857544898987,0.8400489687919617
|
| 13 |
+
11,1067,0.8283694982528687,0.8077839612960815
|
| 14 |
+
12,1156,0.7916938662528992,0.6920057535171509
|
| 15 |
+
13,1245,0.7782658934593201,0.7309020161628723
|
| 16 |
+
14,1334,0.7399182319641113,0.5796863436698914
|
| 17 |
+
15,1423,0.6986851096153259,0.4417501091957092
|
| 18 |
+
16,1512,0.6836443543434143,0.4796272218227386
|
| 19 |
+
17,1601,0.6603368520736694,0.4024624228477478
|
| 20 |
+
18,1690,0.6568854451179504,0.3816156387329101
|
| 21 |
+
19,1779,0.6209790706634521,0.351207435131073
|
| 22 |
+
20,1868,0.5972228050231934,0.3669344484806061
|
| 23 |
+
21,1957,0.598533570766449,0.3335961699485779
|
| 24 |
+
22,2046,0.5858601927757263,0.3595267832279205
|
| 25 |
+
23,2135,0.5555119514465332,0.3365775048732757
|
| 26 |
+
24,2224,0.5531841516494751,0.338556706905365
|
| 27 |
+
25,2313,0.5307201147079468,0.3455941379070282
|
| 28 |
+
26,2402,0.5229456424713135,0.3448736071586609
|
| 29 |
+
27,2491,0.498478353023529,0.3512353301048279
|
| 30 |
+
28,2580,0.4923514425754547,0.3529695570468902
|
| 31 |
+
29,2669,0.501798152923584,0.3565228581428528
|
| 32 |
+
30,2758,0.4853680729866028,0.3635339140892029
|
| 33 |
+
31,2847,0.4735918343067169,0.3745820224285126
|
| 34 |
+
32,2936,0.4744315445423126,0.3685125708580017
|
| 35 |
+
33,3025,0.4655072093009949,0.37449511885643
|
| 36 |
+
34,3114,0.4816225469112396,0.4013707041740417
|
| 37 |
+
35,3203,0.4698217213153839,0.3752176761627197
|
| 38 |
+
36,3292,0.4547047913074493,0.3816895484924316
|
| 39 |
+
37,3381,0.4484803676605224,0.3808594644069671
|
| 40 |
+
38,3470,0.4452821612358093,0.3908793032169342
|
| 41 |
+
39,3559,0.4430929124355316,0.3829881548881531
|
| 42 |
+
40,3648,0.4391266405582428,0.3841899931430816
|
| 43 |
+
41,3737,0.4466865658760071,0.3819206357002258
|
training-log.png
ADDED
|