Model Details

Model Name: GlycoBART
Model Type: Transformer-based sequence-to-sequence generative model for glycan structure prediction
Architecture: BART (Bidirectional and Auto-Regressive Transformers)
Version: 1.0
Date: 6/25/2025

Model Description

GlycoBART is a transformer-based generative deep learning model trained to predict glycan structures from tandem mass spectrometry (MS/MS) data. GlycoBART performs conditional sequence generation, enabling de novo glycan structure inference. The model treats mass spectra as tokenized sequences ("MS sentences") and generates corresponding glycan structures as "Glycan sentences" in an autoregressive manner.

Architecture Details

  • Base Architecture: BartForConditionalGeneration
  • Parameters: 207 million
  • Encoder Layers: 12 transformer layers
  • Decoder Layers: 12 transformer layers
  • Attention Heads: 16 per layer
  • Hidden Size: 768 dimensions
  • Max Sequence Length: 512 tokens
  • Vocabulary Size: 10,080 tokens (includes glycan vocabulary)

Intended Use

Primary Use Cases

  • De novo glycan structure prediction from MS/MS spectra
  • Discovery of novel glycan structures not present in training data
  • High-throughput glycomics analysis with generative capability
  • Structural annotation of mass spectrometry data beyond known databases
  • Research applications in glycobiology and glycoinformatics

Ethical Considerations

Responsible Use

  • Model predictions should be validated experimentally, especially for de novo structures
  • Not intended for direct clinical decision-making without proper validation
  • Users should understand the model's generative limitations and scope
  • Novel structure predictions require additional verification through database searches and experimental validation

Potential Risks

  • Over-reliance on computational predictions without experimental validation
  • Misinterpretation of confidence scores as absolute certainty
  • Application to data significantly different from training distribution
  • Potential generation of chemically implausible structures requiring expert review

Model Version

  • GlycoBART-F: Version trained on full dataset

Code and Data Availability

Downloads last month
9
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support