Update README.md
Browse files
README.md
CHANGED
|
@@ -23,8 +23,8 @@ We mainly aimed to give the model an understanding of the chemical space of smal
|
|
| 23 |
conducted with a batch size of 128 for 224,000 steps, allowing the model to process each of the 9.4 million spectra approximately three times.
|
| 24 |
The entire pretraining process, including control evaluations every 16,000 steps, took 33 hours on a single Nvidia H100 GPU.
|
| 25 |
|
| 26 |
-
During pretraining, the percentage of correctly reconstructed validation spectra steadily increased, but remained relatively low at the end
|
| 27 |
-
|
| 28 |
strings (RASSP, NEIMS) were valid canonical molecules, with 83\% (RASSP), 65\% (NEIMS), and 11\% (NIST) having correct molecular formulas.
|
| 29 |
These results suggest that during the pretraining phase, the model successfully learned molecular structure rules and the relationship between atomic
|
| 30 |
weight and m/z values, forming a good foundation for subsequent finetuning.
|
|
|
|
| 23 |
conducted with a batch size of 128 for 224,000 steps, allowing the model to process each of the 9.4 million spectra approximately three times.
|
| 24 |
The entire pretraining process, including control evaluations every 16,000 steps, took 33 hours on a single Nvidia H100 GPU.
|
| 25 |
|
| 26 |
+
During pretraining, the percentage of correctly reconstructed validation spectra steadily increased, but remained relatively low at the end: 27\%
|
| 27 |
+
for RASSP-generated spectra, 13\% for NEIMS-generated spectra, and 2\% for NIST spectra. However, 94\% of the generated SMILES
|
| 28 |
strings (RASSP, NEIMS) were valid canonical molecules, with 83\% (RASSP), 65\% (NEIMS), and 11\% (NIST) having correct molecular formulas.
|
| 29 |
These results suggest that during the pretraining phase, the model successfully learned molecular structure rules and the relationship between atomic
|
| 30 |
weight and m/z values, forming a good foundation for subsequent finetuning.
|