File size: 2,127 Bytes
55fd721 0773969 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
license: apache-2.0
tags:
- glycan
- glycomics
- glycoinformatics
- mass_spectrometry
---
## Model Details
**Model Name:** GlycoBERT
**Model Type:** Transformer-based sequence classifier for glycan structure prediction
**Architecture:** BERT (Bidirectional Encoder Representations from Transformers)
**Version:** 1.0
**Date:** 6/25/2025
### Model Description
GlycoBERT is a transformer-based deep learning model designed to predict glycan structures from tandem mass spectrometry (MS/MS) data. The model treats mass spectra as tokenized sequences ("MS sentences") and performs multi-class classification to assign spectra to one of 3,590 possible glycan structure classes.
### Architecture Details
- **Base Architecture:** BertForSequenceClassification
- **Parameters:** 96 million
- **Layers:** 12 transformer layers
- **Attention Heads:** 12 per layer
- **Hidden Size:** 768 dimensions
- **Max Sequence Length:** 512 tokens
- **Vocabulary Size:** 10,010 tokens
## Intended Use
### Primary Use Cases
- Glycan structure prediction from MS/MS spectra
- High-throughput glycomics analysis
- Structural annotation of mass spectrometry data
- Research applications in glycobiology and glycoinformatics
## Ethical Considerations
### Responsible Use
- Model predictions should be validated experimentally
- Not intended for direct clinical decision-making without proper validation
- Users should understand the model's limitations and scope
### Potential Risks
- Over-reliance on computational predictions without experimental validation
- Misinterpretation of confidence scores as absolute certainty
- Application to data significantly different from training distribution
## Model Version
- **GlycoBERT-F:** Version trained on full dataset
### Code and Data Availability
- **Repository:** [GitHub (glycotrans)](https://github.com/cabsel/glycotrans)
- **Training Data:** [](https://doi.org/10.5281/zenodo.15741423)
- **Example Inference:** [Google Colab](https://colab.research.google.com/drive/1otVLVDQfLyldtIFcBxGnwVf9PeeTnJ17?usp=sharing)
|