abrilrisso
/

massspecgym-denovo-decoder

Text Generation

mass-spectrometry

Model card Files Files and versions

MassSpecGym De Novo: Generative SMILES Transformer

This is an Encoder-Decoder Transformer designed to "translate" mass spectrometry peaks directly into chemical structures (SMILES strings) without a candidate database.

Model Details

Architecture: Spectral Encoder-Decoder with Intensity Rank Embeddings.
Objective: Cross-Entropy with Label Smoothing and Teacher Forcing.
Inference: Beam Search (k=5) with Length Penalty.
Output: Generative SMILES strings representing the molecular structure.

Performance (MassSpecGym Test Set)

The model outperforms standard generative baselines:

Top-1 Tanimoto Similarity: 0.108
Top-5 Tanimoto Similarity: 0.130
MCES Distance (Top-1): 37.93
Exact Match Accuracy: 0.0% (Consistent with SOTA for this benchmark).

Key Features

Intensity Rank Embeddings: Prioritizes strong spectral signals to guide the generation process.
Beam Search Decoding: Explores multiple structural paths to find globally optimal candidates.
Teacher Forcing: Ensures stable training and fast convergence for complex chemical grammar.

Usage

Full training and generation scripts are available at the GitHub Repository.

Downloads last month: -; Downloads are not tracked for this model. How to track