Allanatrix commited on
Commit
677b638
·
verified ·
1 Parent(s): bcd19f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -15,6 +15,7 @@ tags:
15
  - chemistry
16
  datasets:
17
  - roman-bushuiev/GeMS
 
18
  metrics:
19
  - hit@k
20
  - cosine_similarity
@@ -24,6 +25,8 @@ metrics:
24
 
25
  # NexaMass-V3-Struct
26
 
 
 
27
  `NexaMass-V3-Struct` is a compact MS/MS spectral encoder for structure-aware representation learning and candidate-bank molecular inference. It maps tandem mass spectra into a learned spectral embedding and predicts RDKit Morgan fingerprint probabilities that can be compared against candidate molecular fingerprints. The model is intended for spectrum embedding, candidate narrowing, structure-aware retrieval research, and confidence/abstention experiments. It is not a de novo molecule generator and should not be used as a standalone top-1 molecular identifier.
28
 
29
  The model was developed as part of the Nexa MS/MS pipeline. The foundation checkpoint, `NexaMass-V3`, was trained as a self-supervised spectral encoder over an approximately 201M-spectrum phase-1 campaign. The structure-aligned checkpoint, `NexaMass-V3-Struct`, adapts that encoder to a corrected labeled surface with real RDKit Morgan fingerprint targets. The main finding is that a small encoder can carry useful structure signal and support candidate-bank narrowing, while exact local ranking and calibrated confidence remain separate downstream problems.
@@ -34,6 +37,9 @@ The model was developed as part of the Nexa MS/MS pipeline. The foundation check
34
 
35
  The model contains a spectral transformer backbone, an SSL projection head, a Morgan fingerprint structure head, a spectrum-side retrieval query projection, a fingerprint-side target projection, and experimental retrieval/reranking heads. The strongest current inference surface is the Morgan fingerprint prediction and candidate-bank decode path. The trained retrieval projection is included for research, but it is not promoted as a reliable final decision layer.
36
 
 
 
 
37
  ## Intended Use
38
 
39
  Use this model to embed MS/MS spectra, build nearest-neighbor or clustering analyses, predict Morgan fingerprint probability vectors from spectra, score spectra against candidate banks, inspect structure-family neighborhoods, and develop ranking or confidence adapters over a frozen spectral encoder. The model is also suitable for research into structure-aware MS/MS representation learning and candidate narrowing.
@@ -82,7 +88,3 @@ MS/MS structure inference can affect downstream scientific interpretation. Users
82
  ## Citation
83
 
84
  If you use this model, cite the NexaMass project release and the accompanying technical report when available. Relevant background work includes DreaMS for self-supervised MS/MS representation learning, MassSpecGym for benchmark framing, CSI:FingerID for fingerprint-mediated candidate search, and related spectra-structure retrieval and de novo generation systems such as MIST, MSNovelist, CMSSP, CSU-MS2, MSBERT, Spec2Mol, and MS2Mol.
85
-
86
- ## Recommended Name
87
-
88
- Use `NexaMass-V3` for the promoted self-supervised foundation encoder and `NexaMass-V3-Struct` for the RDKit/Morgan-aligned checkpoint. Use `NexaMass-Atlas` for future indexed candidate-bank inference systems built on top of this encoder.
 
15
  - chemistry
16
  datasets:
17
  - roman-bushuiev/GeMS
18
+ - roman-bushuiev/MassSpecGym
19
  metrics:
20
  - hit@k
21
  - cosine_similarity
 
25
 
26
  # NexaMass-V3-Struct
27
 
28
+
29
+ ![nexamass_inference_density_atlas_scaling](https://cdn-uploads.huggingface.co/production/uploads/6643ee570c34964ea12d8bd9/_TrclJb8JyjuhlFG0heH_.png)
30
  `NexaMass-V3-Struct` is a compact MS/MS spectral encoder for structure-aware representation learning and candidate-bank molecular inference. It maps tandem mass spectra into a learned spectral embedding and predicts RDKit Morgan fingerprint probabilities that can be compared against candidate molecular fingerprints. The model is intended for spectrum embedding, candidate narrowing, structure-aware retrieval research, and confidence/abstention experiments. It is not a de novo molecule generator and should not be used as a standalone top-1 molecular identifier.
31
 
32
  The model was developed as part of the Nexa MS/MS pipeline. The foundation checkpoint, `NexaMass-V3`, was trained as a self-supervised spectral encoder over an approximately 201M-spectrum phase-1 campaign. The structure-aligned checkpoint, `NexaMass-V3-Struct`, adapts that encoder to a corrected labeled surface with real RDKit Morgan fingerprint targets. The main finding is that a small encoder can carry useful structure signal and support candidate-bank narrowing, while exact local ranking and calibrated confidence remain separate downstream problems.
 
37
 
38
  The model contains a spectral transformer backbone, an SSL projection head, a Morgan fingerprint structure head, a spectrum-side retrieval query projection, a fingerprint-side target projection, and experimental retrieval/reranking heads. The strongest current inference surface is the Morgan fingerprint prediction and candidate-bank decode path. The trained retrieval projection is included for research, but it is not promoted as a reliable final decision layer.
39
 
40
+
41
+ ![nexamass_encoder_architecture](https://cdn-uploads.huggingface.co/production/uploads/6643ee570c34964ea12d8bd9/N_KySKnB-St8bztjoujfr.png)
42
+
43
  ## Intended Use
44
 
45
  Use this model to embed MS/MS spectra, build nearest-neighbor or clustering analyses, predict Morgan fingerprint probability vectors from spectra, score spectra against candidate banks, inspect structure-family neighborhoods, and develop ranking or confidence adapters over a frozen spectral encoder. The model is also suitable for research into structure-aware MS/MS representation learning and candidate narrowing.
 
88
  ## Citation
89
 
90
  If you use this model, cite the NexaMass project release and the accompanying technical report when available. Relevant background work includes DreaMS for self-supervised MS/MS representation learning, MassSpecGym for benchmark framing, CSI:FingerID for fingerprint-mediated candidate search, and related spectra-structure retrieval and de novo generation systems such as MIST, MSNovelist, CMSSP, CSU-MS2, MSBERT, Spec2Mol, and MS2Mol.