Update README.md
Browse files
README.md
CHANGED
|
@@ -15,6 +15,7 @@ tags:
|
|
| 15 |
- chemistry
|
| 16 |
datasets:
|
| 17 |
- roman-bushuiev/GeMS
|
|
|
|
| 18 |
metrics:
|
| 19 |
- hit@k
|
| 20 |
- cosine_similarity
|
|
@@ -24,6 +25,8 @@ metrics:
|
|
| 24 |
|
| 25 |
# NexaMass-V3-Struct
|
| 26 |
|
|
|
|
|
|
|
| 27 |
`NexaMass-V3-Struct` is a compact MS/MS spectral encoder for structure-aware representation learning and candidate-bank molecular inference. It maps tandem mass spectra into a learned spectral embedding and predicts RDKit Morgan fingerprint probabilities that can be compared against candidate molecular fingerprints. The model is intended for spectrum embedding, candidate narrowing, structure-aware retrieval research, and confidence/abstention experiments. It is not a de novo molecule generator and should not be used as a standalone top-1 molecular identifier.
|
| 28 |
|
| 29 |
The model was developed as part of the Nexa MS/MS pipeline. The foundation checkpoint, `NexaMass-V3`, was trained as a self-supervised spectral encoder over an approximately 201M-spectrum phase-1 campaign. The structure-aligned checkpoint, `NexaMass-V3-Struct`, adapts that encoder to a corrected labeled surface with real RDKit Morgan fingerprint targets. The main finding is that a small encoder can carry useful structure signal and support candidate-bank narrowing, while exact local ranking and calibrated confidence remain separate downstream problems.
|
|
@@ -34,6 +37,9 @@ The model was developed as part of the Nexa MS/MS pipeline. The foundation check
|
|
| 34 |
|
| 35 |
The model contains a spectral transformer backbone, an SSL projection head, a Morgan fingerprint structure head, a spectrum-side retrieval query projection, a fingerprint-side target projection, and experimental retrieval/reranking heads. The strongest current inference surface is the Morgan fingerprint prediction and candidate-bank decode path. The trained retrieval projection is included for research, but it is not promoted as a reliable final decision layer.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
| 37 |
## Intended Use
|
| 38 |
|
| 39 |
Use this model to embed MS/MS spectra, build nearest-neighbor or clustering analyses, predict Morgan fingerprint probability vectors from spectra, score spectra against candidate banks, inspect structure-family neighborhoods, and develop ranking or confidence adapters over a frozen spectral encoder. The model is also suitable for research into structure-aware MS/MS representation learning and candidate narrowing.
|
|
@@ -82,7 +88,3 @@ MS/MS structure inference can affect downstream scientific interpretation. Users
|
|
| 82 |
## Citation
|
| 83 |
|
| 84 |
If you use this model, cite the NexaMass project release and the accompanying technical report when available. Relevant background work includes DreaMS for self-supervised MS/MS representation learning, MassSpecGym for benchmark framing, CSI:FingerID for fingerprint-mediated candidate search, and related spectra-structure retrieval and de novo generation systems such as MIST, MSNovelist, CMSSP, CSU-MS2, MSBERT, Spec2Mol, and MS2Mol.
|
| 85 |
-
|
| 86 |
-
## Recommended Name
|
| 87 |
-
|
| 88 |
-
Use `NexaMass-V3` for the promoted self-supervised foundation encoder and `NexaMass-V3-Struct` for the RDKit/Morgan-aligned checkpoint. Use `NexaMass-Atlas` for future indexed candidate-bank inference systems built on top of this encoder.
|
|
|
|
| 15 |
- chemistry
|
| 16 |
datasets:
|
| 17 |
- roman-bushuiev/GeMS
|
| 18 |
+
- roman-bushuiev/MassSpecGym
|
| 19 |
metrics:
|
| 20 |
- hit@k
|
| 21 |
- cosine_similarity
|
|
|
|
| 25 |
|
| 26 |
# NexaMass-V3-Struct
|
| 27 |
|
| 28 |
+
|
| 29 |
+

|
| 30 |
`NexaMass-V3-Struct` is a compact MS/MS spectral encoder for structure-aware representation learning and candidate-bank molecular inference. It maps tandem mass spectra into a learned spectral embedding and predicts RDKit Morgan fingerprint probabilities that can be compared against candidate molecular fingerprints. The model is intended for spectrum embedding, candidate narrowing, structure-aware retrieval research, and confidence/abstention experiments. It is not a de novo molecule generator and should not be used as a standalone top-1 molecular identifier.
|
| 31 |
|
| 32 |
The model was developed as part of the Nexa MS/MS pipeline. The foundation checkpoint, `NexaMass-V3`, was trained as a self-supervised spectral encoder over an approximately 201M-spectrum phase-1 campaign. The structure-aligned checkpoint, `NexaMass-V3-Struct`, adapts that encoder to a corrected labeled surface with real RDKit Morgan fingerprint targets. The main finding is that a small encoder can carry useful structure signal and support candidate-bank narrowing, while exact local ranking and calibrated confidence remain separate downstream problems.
|
|
|
|
| 37 |
|
| 38 |
The model contains a spectral transformer backbone, an SSL projection head, a Morgan fingerprint structure head, a spectrum-side retrieval query projection, a fingerprint-side target projection, and experimental retrieval/reranking heads. The strongest current inference surface is the Morgan fingerprint prediction and candidate-bank decode path. The trained retrieval projection is included for research, but it is not promoted as a reliable final decision layer.
|
| 39 |
|
| 40 |
+
|
| 41 |
+

|
| 42 |
+
|
| 43 |
## Intended Use
|
| 44 |
|
| 45 |
Use this model to embed MS/MS spectra, build nearest-neighbor or clustering analyses, predict Morgan fingerprint probability vectors from spectra, score spectra against candidate banks, inspect structure-family neighborhoods, and develop ranking or confidence adapters over a frozen spectral encoder. The model is also suitable for research into structure-aware MS/MS representation learning and candidate narrowing.
|
|
|
|
| 88 |
## Citation
|
| 89 |
|
| 90 |
If you use this model, cite the NexaMass project release and the accompanying technical report when available. Relevant background work includes DreaMS for self-supervised MS/MS representation learning, MassSpecGym for benchmark framing, CSI:FingerID for fingerprint-mediated candidate search, and related spectra-structure retrieval and de novo generation systems such as MIST, MSNovelist, CMSSP, CSU-MS2, MSBERT, Spec2Mol, and MS2Mol.
|
|
|
|
|
|
|
|
|
|
|
|