Model Details
Model Name: GlycoBERT
Model Type: Transformer-based sequence classifier for glycan structure prediction
Architecture: BERT (Bidirectional Encoder Representations from Transformers)
Version: 1.0
Date: 6/25/2025
Model Description
GlycoBERT is a transformer-based deep learning model designed to predict glycan structures from tandem mass spectrometry (MS/MS) data. The model treats mass spectra as tokenized sequences ("MS sentences") and performs multi-class classification to assign spectra to one of 3,590 possible glycan structure classes.
Architecture Details
- Base Architecture: BertForSequenceClassification
- Parameters: 96 million
- Layers: 12 transformer layers
- Attention Heads: 12 per layer
- Hidden Size: 768 dimensions
- Max Sequence Length: 512 tokens
- Vocabulary Size: 10,010 tokens
Intended Use
Primary Use Cases
- Glycan structure prediction from MS/MS spectra
- High-throughput glycomics analysis
- Structural annotation of mass spectrometry data
- Research applications in glycobiology and glycoinformatics
Ethical Considerations
Responsible Use
- Model predictions should be validated experimentally
- Not intended for direct clinical decision-making without proper validation
- Users should understand the model's limitations and scope
Potential Risks
- Over-reliance on computational predictions without experimental validation
- Misinterpretation of confidence scores as absolute certainty
- Application to data significantly different from training distribution
Model Version
- GlycoBERT-F: Version trained on full dataset
Code and Data Availability
- Repository: GitHub (glycotrans)
- Training Data:
- Example Inference: Google Colab
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support