Model Details
Model Name: GlycoBART
Model Type: Transformer-based sequence-to-sequence generative model for glycan structure prediction
Architecture: BART (Bidirectional and Auto-Regressive Transformers)
Version: 1.0
Date: 6/25/2025
Model Description
GlycoBART is a transformer-based generative deep learning model trained to predict glycan structures from tandem mass spectrometry (MS/MS) data. GlycoBART performs conditional sequence generation, enabling de novo glycan structure inference. The model treats mass spectra as tokenized sequences ("MS sentences") and generates corresponding glycan structures as "Glycan sentences" in an autoregressive manner.
Architecture Details
- Base Architecture: BartForConditionalGeneration
- Parameters: 207 million
- Encoder Layers: 12 transformer layers
- Decoder Layers: 12 transformer layers
- Attention Heads: 16 per layer
- Hidden Size: 768 dimensions
- Max Sequence Length: 512 tokens
- Vocabulary Size: 10,080 tokens (includes glycan vocabulary)
Intended Use
Primary Use Cases
- De novo glycan structure prediction from MS/MS spectra
- Discovery of novel glycan structures not present in training data
- High-throughput glycomics analysis with generative capability
- Structural annotation of mass spectrometry data beyond known databases
- Research applications in glycobiology and glycoinformatics
Ethical Considerations
Responsible Use
- Model predictions should be validated experimentally, especially for de novo structures
- Not intended for direct clinical decision-making without proper validation
- Users should understand the model's generative limitations and scope
- Novel structure predictions require additional verification through database searches and experimental validation
Potential Risks
- Over-reliance on computational predictions without experimental validation
- Misinterpretation of confidence scores as absolute certainty
- Application to data significantly different from training distribution
- Potential generation of chemically implausible structures requiring expert review
Model Version
- GlycoBART-F: Version trained on full dataset
Code and Data Availability
- Repository: GitHub (glycotrans)
- Training Data:
- Example Inference: Google Colab
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support