Model Details

Model Name: GlycoBERT
Model Type: Transformer-based sequence classifier for glycan structure prediction
Architecture: BERT (Bidirectional Encoder Representations from Transformers)
Version: 1.0
Date: 6/25/2025

Model Description

GlycoBERT is a transformer-based deep learning model designed to predict glycan structures from tandem mass spectrometry (MS/MS) data. The model treats mass spectra as tokenized sequences ("MS sentences") and performs multi-class classification to assign spectra to one of 3,590 possible glycan structure classes.

Architecture Details

  • Base Architecture: BertForSequenceClassification
  • Parameters: 96 million
  • Layers: 12 transformer layers
  • Attention Heads: 12 per layer
  • Hidden Size: 768 dimensions
  • Max Sequence Length: 512 tokens
  • Vocabulary Size: 10,010 tokens

Intended Use

Primary Use Cases

  • Glycan structure prediction from MS/MS spectra
  • High-throughput glycomics analysis
  • Structural annotation of mass spectrometry data
  • Research applications in glycobiology and glycoinformatics

Ethical Considerations

Responsible Use

  • Model predictions should be validated experimentally
  • Not intended for direct clinical decision-making without proper validation
  • Users should understand the model's limitations and scope

Potential Risks

  • Over-reliance on computational predictions without experimental validation
  • Misinterpretation of confidence scores as absolute certainty
  • Application to data significantly different from training distribution

Model Version

  • GlycoBERT-F: Version trained on full dataset

Code and Data Availability

Downloads last month
3
Safetensors
Model size
96.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support