eachanjohnson commited on
Commit
1032858
·
verified ·
1 Parent(s): d604210

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,140 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: tabular-regression
4
+ tags:
5
+ - chemistry
6
+ - microbiology
7
+ - antibiotics
8
+ library_name: duvida
9
+ datasets:
10
+ - scbirlab/thomas-2018-spark-wt
11
+ ---
12
+
13
+ # Predictor of _Staphylococcus aureus_ MICs
14
+
15
+ _Updated:_ Fri 28 Mar 14:27:41 GMT 2025
16
+
17
+ Trained on the _Staphylococcus aureus_, WT accumulator phenotype subset of the [human-curated SPARK dataset](https://doi.org/10.1021/acsinfecdis.8b00193) (2115 rows in total for _Staphylococcus aureus_).
18
+
19
+ ## Model details
20
+
21
+ This model was trained using [our Duvida framework](https://github.com/scbirlab/duvida),
22
+ as a result of hyperparameter searches and selecting the model that performs best on unseen test data
23
+ (from a scaffold split).
24
+
25
+ Duvida also saves the training data in this checkpoint to allows the calculation of uncertainty metrics
26
+ based on that training data.
27
+
28
+ This model is the best regression model from a hyperparameter search, determined
29
+ by Spearman's $\rho$ on a held-out test set not used in training or early stopping.
30
+
31
+ ### Model architecture
32
+
33
+ - **Regression**
34
+
35
+ ```json
36
+
37
+ {
38
+ "dropout": 0.2,
39
+ "ensemble_size": 10,
40
+ "extra_featurizers": null,
41
+ "learning_rate": 0.0001,
42
+ "model_class": "ChempropModelBox",
43
+ "n_hidden": 1,
44
+ "n_units": 16,
45
+ "use_2d": false,
46
+ "use_fp": true
47
+ }
48
+ ```
49
+
50
+ ### Model usage
51
+
52
+ You can use this model with:
53
+
54
+ ```python
55
+ from duvida.autoclasses import AutoModelBox
56
+ modelbox = AutoModelBox.from_pretrained("hf://scbirlab/spark-dv-2503-saur")
57
+ modelbox.predict(filename=..., inputs=[...], columns=[...]) # make predictions on your own data
58
+ ```
59
+
60
+ ## Training details
61
+
62
+ - **Dataset:** [SPARK, WT accumulator, _Staphylococcus aureus_ subset](https://huggingface.co/datasets/scbirlab/thomas-2018-spark-wt)
63
+ - **Input column:** smiles
64
+ - **Output column:** pmic
65
+ - **Split type:** Murcko scaffold
66
+ - **Split proportions:**
67
+ - 70% training (1424 rows)
68
+ - 15% validation (for early stopping) (309 rows)
69
+ - 15% test (for selecting hyperparameters) (316 rows)
70
+
71
+ Here is the training log:
72
+
73
+ <img src="training-log.png" width=450>
74
+
75
+ And these are the evaluation scores.
76
+
77
+ Train (1424 rows):
78
+
79
+ ```json
80
+
81
+ {
82
+ "Pearson r": 0.8988537726404331,
83
+ "RMSE": 0.26244109869003296,
84
+ "Spearman rho": 0.7846286417647882
85
+ }
86
+ ```
87
+
88
+ Validation (309 rows):
89
+
90
+ ```json
91
+
92
+ {
93
+ "Pearson r": 0.9436229039637082,
94
+ "RMSE": 0.3413218855857849,
95
+ "Spearman rho": 0.8756560069371535
96
+ }
97
+ ```
98
+
99
+
100
+ Test (316 rows):
101
+
102
+ ```json
103
+
104
+ {
105
+ "Pearson r": 0.7476855201027743,
106
+ "RMSE": 0.7906074523925781,
107
+ "Spearman rho": 0.8293761872496432
108
+ }
109
+ ```
110
+
111
+ ## Training data details
112
+
113
+ The training data were collated by the authors of:
114
+
115
+ > Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
116
+ > Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery
117
+ > ACS Infectious Diseases 2018 4 (11), 1536-1539
118
+ > DOI: 10.1021/acsinfecdis.8b00193
119
+
120
+ We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values,
121
+ give succint column titles, and split by species.
122
+
123
+ This particular dataset retains only measurements on bacteria with wild-type accumulation phenotypes.
124
+
125
+ ### Dataset Sources
126
+
127
+ - **Repository:** https://www.collaborativedrug.com/spark-data-downloads
128
+ - **Paper:** https://doi.org/10.1021/acsinfecdis.8b00193
129
+
130
+ ### Data Collection and Processing
131
+
132
+ Data were processed using [schemist](https://github.com/scbirlab/schemist), a tool for processing chemical datasets.
133
+
134
+ The SMILES strings have been canonicalized, and split into training (70%), validation (15%), and test (15%) sets
135
+ by Murcko scaffold for each species with more than 1000 entries. Additional features like molecular weight and
136
+ topological polar surface area have also been calculated.
137
+
138
+ ### Who are the source data producers?
139
+
140
+ Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
data-config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_default_cache": "cache/duvida/data",
3
+ "_in_key": "inputs",
4
+ "_input_cols": [
5
+ "smiles"
6
+ ],
7
+ "_label_cols": [
8
+ "pmic"
9
+ ],
10
+ "_out_key": "labels",
11
+ "input_shape": [
12
+ 2048
13
+ ],
14
+ "output_shape": [
15
+ 1
16
+ ]
17
+ }
data-load-args.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cache": "/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache",
3
+ "features": [
4
+ "smiles"
5
+ ],
6
+ "filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz",
7
+ "labels": [
8
+ "pmic"
9
+ ]
10
+ }
eval-metrics_test.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Pearson r": 0.7476855201027743,
3
+ "RMSE": 0.7906074523925781,
4
+ "Spearman rho": 0.8293761872496432
5
+ }
eval-metrics_train.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Pearson r": 0.8988537726404331,
3
+ "RMSE": 0.26244109869003296,
4
+ "Spearman rho": 0.7846286417647882
5
+ }
eval-metrics_validation.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Pearson r": 0.9436229039637082,
3
+ "RMSE": 0.3413218855857849,
4
+ "Spearman rho": 0.8756560069371535
5
+ }
input-data.hf/data-00000-of-00001.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:900da3bf86034317afa2c12fc68b300aedad60e84d41775622b7d1c5f53e85c2
3
+ size 201832
input-data.hf/dataset_info.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "builder_name": "csv",
3
+ "citation": "",
4
+ "config_name": "default",
5
+ "dataset_name": "csv",
6
+ "dataset_size": 547635,
7
+ "description": "",
8
+ "download_checksums": {
9
+ "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz": {
10
+ "num_bytes": 89427,
11
+ "checksum": null
12
+ }
13
+ },
14
+ "download_size": 89427,
15
+ "features": {
16
+ "smiles": {
17
+ "dtype": "string",
18
+ "_type": "Value"
19
+ },
20
+ "inputs": {
21
+ "feature": {
22
+ "dtype": "string",
23
+ "_type": "Value"
24
+ },
25
+ "_type": "Sequence"
26
+ },
27
+ "labels": {
28
+ "feature": {
29
+ "dtype": "float64",
30
+ "_type": "Value"
31
+ },
32
+ "_type": "Sequence"
33
+ }
34
+ },
35
+ "homepage": "",
36
+ "license": "",
37
+ "size_in_bytes": 637062,
38
+ "splits": {
39
+ "train": {
40
+ "name": "train",
41
+ "num_bytes": 547635,
42
+ "num_examples": 1424,
43
+ "dataset_name": "csv"
44
+ }
45
+ },
46
+ "version": {
47
+ "version_str": "0.0.0",
48
+ "major": 0,
49
+ "minor": 0,
50
+ "patch": 0
51
+ }
52
+ }
input-data.hf/state.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_data_files": [
3
+ {
4
+ "filename": "data-00000-of-00001.arrow"
5
+ }
6
+ ],
7
+ "_fingerprint": "c23b3d599a50a0d3",
8
+ "_format_columns": null,
9
+ "_format_kwargs": {},
10
+ "_format_type": null,
11
+ "_output_all_columns": false,
12
+ "_split": "train"
13
+ }
logs-csv/lightning_logs/version_0/hparams.yaml ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
logs-csv/lightning_logs/version_0/metrics.csv ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,loss,step,val_loss
2
+ 0,,88,4.601920127868652
3
+ 0,7.492936611175537,88,
4
+ 1,,177,4.012445449829102
5
+ 1,1.5991687774658203,177,
6
+ 2,,266,3.6562659740448
7
+ 2,1.41478431224823,266,
8
+ 3,,355,2.806457996368408
9
+ 3,1.3028432130813599,355,
10
+ 4,,444,2.421854257583618
11
+ 4,1.1816277503967285,444,
12
+ 5,,533,2.188049554824829
13
+ 5,1.0972480773925781,533,
14
+ 6,,622,1.5965540409088135
15
+ 6,1.025787591934204,622,
16
+ 7,,711,1.5013176202774048
17
+ 7,0.9804941415786743,711,
18
+ 8,,800,1.1018073558807373
19
+ 8,0.9172889590263367,800,
20
+ 9,,889,1.0003383159637451
21
+ 9,0.8965839743614197,889,
22
+ 10,,978,0.8400489687919617
23
+ 10,0.8749857544898987,978,
24
+ 11,,1067,0.8077839612960815
25
+ 11,0.8283694982528687,1067,
26
+ 12,,1156,0.6920057535171509
27
+ 12,0.7916938662528992,1156,
28
+ 13,,1245,0.7309020161628723
29
+ 13,0.7782658934593201,1245,
30
+ 14,,1334,0.5796863436698914
31
+ 14,0.7399182319641113,1334,
32
+ 15,,1423,0.44175010919570923
33
+ 15,0.6986851096153259,1423,
34
+ 16,,1512,0.47962722182273865
35
+ 16,0.6836443543434143,1512,
36
+ 17,,1601,0.4024624228477478
37
+ 17,0.6603368520736694,1601,
38
+ 18,,1690,0.38161563873291016
39
+ 18,0.6568854451179504,1690,
40
+ 19,,1779,0.351207435131073
41
+ 19,0.6209790706634521,1779,
42
+ 20,,1868,0.3669344484806061
43
+ 20,0.5972228050231934,1868,
44
+ 21,,1957,0.3335961699485779
45
+ 21,0.598533570766449,1957,
46
+ 22,,2046,0.35952678322792053
47
+ 22,0.5858601927757263,2046,
48
+ 23,,2135,0.33657750487327576
49
+ 23,0.5555119514465332,2135,
50
+ 24,,2224,0.338556706905365
51
+ 24,0.5531841516494751,2224,
52
+ 25,,2313,0.3455941379070282
53
+ 25,0.5307201147079468,2313,
54
+ 26,,2402,0.3448736071586609
55
+ 26,0.5229456424713135,2402,
56
+ 27,,2491,0.3512353301048279
57
+ 27,0.49847835302352905,2491,
58
+ 28,,2580,0.35296955704689026
59
+ 28,0.4923514425754547,2580,
60
+ 29,,2669,0.3565228581428528
61
+ 29,0.501798152923584,2669,
62
+ 30,,2758,0.3635339140892029
63
+ 30,0.4853680729866028,2758,
64
+ 31,,2847,0.3745820224285126
65
+ 31,0.4735918343067169,2847,
66
+ 32,,2936,0.3685125708580017
67
+ 32,0.4744315445423126,2936,
68
+ 33,,3025,0.37449511885643005
69
+ 33,0.4655072093009949,3025,
70
+ 34,,3114,0.40137070417404175
71
+ 34,0.4816225469112396,3114,
72
+ 35,,3203,0.3752176761627197
73
+ 35,0.4698217213153839,3203,
74
+ 36,,3292,0.38168954849243164
75
+ 36,0.45470479130744934,3292,
76
+ 37,,3381,0.38085946440696716
77
+ 37,0.44848036766052246,3381,
78
+ 38,,3470,0.3908793032169342
79
+ 38,0.4452821612358093,3470,
80
+ 39,,3559,0.3829881548881531
81
+ 39,0.4430929124355316,3559,
82
+ 40,,3648,0.38418999314308167
83
+ 40,0.4391266405582428,3648,
84
+ 41,,3737,0.38192063570022583
85
+ 41,0.4466865658760071,3737,
logs/lightning_logs/version_0/events.out.tfevents.1743097756.cn020.3140491.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f0a9d09df8a21cc42ef0056f12542d37b8b4ffa3e297477a28c26ee74a4445f
3
+ size 7560
logs/lightning_logs/version_0/hparams.yaml ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
metrics.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ split,split_filename,config_i,model_class,n_parameters,filename,features,labels,cache,extra_featurizers,use_2d,use_fp,dropout,ensemble_size,learning_rate,n_hidden,n_units,val_filename,epochs,batch_size,RMSE,Pearson r,Spearman rho
2
+ train,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,40,ChempropModelBox,2653010,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache,,False,True,0.2,10,0.0001,1,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,2000,16,0.26244109869003296,0.8988537726404331,0.7846286417647882
3
+ validation,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,40,ChempropModelBox,2653010,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache,,False,True,0.2,10,0.0001,1,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,2000,16,0.3413218855857849,0.9436229039637082,0.8756560069371535
4
+ test,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-test.csv.gz,40,ChempropModelBox,2653010,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Staphylococcus-aureus/40/cache,,False,True,0.2,10,0.0001,1,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz,2000,16,0.7906074523925781,0.7476855201027743,0.8293761872496432
modelbox-config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dropout": 0.2,
3
+ "ensemble_size": 10,
4
+ "extra_featurizers": null,
5
+ "learning_rate": 0.0001,
6
+ "model_class": "ChempropModelBox",
7
+ "n_hidden": 1,
8
+ "n_units": 16,
9
+ "use_2d": false,
10
+ "use_fp": true
11
+ }
params.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:736b9d735e4d4c5e1397edee5768bacbca0f5b0e42893aa34208c4323ca64bf3
3
+ size 10696484
predictions_test.csv.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9acde7d7a0bb5b152ac795d43830589811b908ed646b5fdf2b8466524217b5d
3
+ size 120405
predictions_train.csv.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf2f6fb9ac3ebdae0a7d061bfa9d6b306144b9c8f9aa8fcc2d1a8e721360f119
3
+ size 423363
predictions_validation.csv.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cb0dfa5badc93dcc9080d96c1367b6a54a77c20ae6618c665f5782ca6f3b409
3
+ size 111443
training-args.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "batch_size": 16,
3
+ "epochs": 2000,
4
+ "val_filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-validation.csv.gz"
5
+ }
training-data.hf/cache-5e5f3860b4738024.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c30c7291475d48b6adfa4a3184562593468c7eda33ae3a89ef91f54907695a6a
3
+ size 42076408
training-data.hf/data-00000-of-00001.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b7ae972edc8c8b602b7dd444ead34dabd1be1a54538e8cee54a59f1eb395d18
3
+ size 41882032
training-data.hf/dataset_info.json ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "builder_name": "csv",
3
+ "citation": "",
4
+ "config_name": "default",
5
+ "dataset_name": "csv",
6
+ "dataset_size": 547635,
7
+ "description": "",
8
+ "download_checksums": {
9
+ "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Staphylococcus-aureus/scaffold-split-train.csv.gz": {
10
+ "num_bytes": 89427,
11
+ "checksum": null
12
+ }
13
+ },
14
+ "download_size": 89427,
15
+ "features": {
16
+ "smiles": {
17
+ "feature": {
18
+ "dtype": "string",
19
+ "_type": "Value"
20
+ },
21
+ "_type": "Sequence"
22
+ },
23
+ "inputs": {
24
+ "V_d": {
25
+ "dtype": "null",
26
+ "_type": "Value"
27
+ },
28
+ "gt_mask": {
29
+ "dtype": "null",
30
+ "_type": "Value"
31
+ },
32
+ "lt_mask": {
33
+ "dtype": "null",
34
+ "_type": "Value"
35
+ },
36
+ "mg": {
37
+ "E": {
38
+ "feature": {
39
+ "feature": {
40
+ "dtype": "float32",
41
+ "_type": "Value"
42
+ },
43
+ "_type": "Sequence"
44
+ },
45
+ "_type": "Sequence"
46
+ },
47
+ "V": {
48
+ "feature": {
49
+ "feature": {
50
+ "dtype": "float32",
51
+ "_type": "Value"
52
+ },
53
+ "_type": "Sequence"
54
+ },
55
+ "_type": "Sequence"
56
+ },
57
+ "edge_index": {
58
+ "feature": {
59
+ "feature": {
60
+ "dtype": "float32",
61
+ "_type": "Value"
62
+ },
63
+ "_type": "Sequence"
64
+ },
65
+ "_type": "Sequence"
66
+ },
67
+ "rev_edge_index": {
68
+ "feature": {
69
+ "dtype": "float32",
70
+ "_type": "Value"
71
+ },
72
+ "_type": "Sequence"
73
+ }
74
+ },
75
+ "weight": {
76
+ "dtype": "float32",
77
+ "_type": "Value"
78
+ },
79
+ "x_d": {
80
+ "feature": {
81
+ "dtype": "float32",
82
+ "_type": "Value"
83
+ },
84
+ "_type": "Sequence"
85
+ },
86
+ "y": {
87
+ "feature": {
88
+ "dtype": "float32",
89
+ "_type": "Value"
90
+ },
91
+ "_type": "Sequence"
92
+ }
93
+ },
94
+ "labels": {
95
+ "feature": {
96
+ "dtype": "float64",
97
+ "_type": "Value"
98
+ },
99
+ "_type": "Sequence"
100
+ },
101
+ "extra_features": {
102
+ "feature": {
103
+ "dtype": "float32",
104
+ "_type": "Value"
105
+ },
106
+ "_type": "Sequence"
107
+ }
108
+ },
109
+ "homepage": "",
110
+ "license": "",
111
+ "size_in_bytes": 637062,
112
+ "splits": {
113
+ "train": {
114
+ "name": "train",
115
+ "num_bytes": 547635,
116
+ "num_examples": 1424,
117
+ "dataset_name": "csv"
118
+ }
119
+ },
120
+ "version": {
121
+ "version_str": "0.0.0",
122
+ "major": 0,
123
+ "minor": 0,
124
+ "patch": 0
125
+ }
126
+ }
training-data.hf/state.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_data_files": [
3
+ {
4
+ "filename": "data-00000-of-00001.arrow"
5
+ }
6
+ ],
7
+ "_fingerprint": "5629eb6ca89ad40c",
8
+ "_format_columns": null,
9
+ "_format_kwargs": {
10
+ "dtype": "float"
11
+ },
12
+ "_format_type": "numpy",
13
+ "_output_all_columns": false,
14
+ "_split": "train"
15
+ }
training-log.csv ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,step,loss,val_loss
2
+ 0,88,7.492936611175537,4.601920127868652
3
+ 1,177,1.5991687774658203,4.012445449829102
4
+ 2,266,1.41478431224823,3.6562659740448
5
+ 3,355,1.30284321308136,2.806457996368408
6
+ 4,444,1.1816277503967283,2.421854257583618
7
+ 5,533,1.097248077392578,2.188049554824829
8
+ 6,622,1.025787591934204,1.5965540409088137
9
+ 7,711,0.9804941415786744,1.5013176202774048
10
+ 8,800,0.9172889590263368,1.1018073558807373
11
+ 9,889,0.8965839743614197,1.0003383159637451
12
+ 10,978,0.8749857544898987,0.8400489687919617
13
+ 11,1067,0.8283694982528687,0.8077839612960815
14
+ 12,1156,0.7916938662528992,0.6920057535171509
15
+ 13,1245,0.7782658934593201,0.7309020161628723
16
+ 14,1334,0.7399182319641113,0.5796863436698914
17
+ 15,1423,0.6986851096153259,0.4417501091957092
18
+ 16,1512,0.6836443543434143,0.4796272218227386
19
+ 17,1601,0.6603368520736694,0.4024624228477478
20
+ 18,1690,0.6568854451179504,0.3816156387329101
21
+ 19,1779,0.6209790706634521,0.351207435131073
22
+ 20,1868,0.5972228050231934,0.3669344484806061
23
+ 21,1957,0.598533570766449,0.3335961699485779
24
+ 22,2046,0.5858601927757263,0.3595267832279205
25
+ 23,2135,0.5555119514465332,0.3365775048732757
26
+ 24,2224,0.5531841516494751,0.338556706905365
27
+ 25,2313,0.5307201147079468,0.3455941379070282
28
+ 26,2402,0.5229456424713135,0.3448736071586609
29
+ 27,2491,0.498478353023529,0.3512353301048279
30
+ 28,2580,0.4923514425754547,0.3529695570468902
31
+ 29,2669,0.501798152923584,0.3565228581428528
32
+ 30,2758,0.4853680729866028,0.3635339140892029
33
+ 31,2847,0.4735918343067169,0.3745820224285126
34
+ 32,2936,0.4744315445423126,0.3685125708580017
35
+ 33,3025,0.4655072093009949,0.37449511885643
36
+ 34,3114,0.4816225469112396,0.4013707041740417
37
+ 35,3203,0.4698217213153839,0.3752176761627197
38
+ 36,3292,0.4547047913074493,0.3816895484924316
39
+ 37,3381,0.4484803676605224,0.3808594644069671
40
+ 38,3470,0.4452821612358093,0.3908793032169342
41
+ 39,3559,0.4430929124355316,0.3829881548881531
42
+ 40,3648,0.4391266405582428,0.3841899931430816
43
+ 41,3737,0.4466865658760071,0.3819206357002258
training-log.png ADDED