|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- glycan |
|
|
- glycomics |
|
|
- glycoinformatics |
|
|
- mass_spectrometry |
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
**Model Name:** GlycoBERT |
|
|
**Model Type:** Transformer-based sequence classifier for glycan structure prediction |
|
|
**Architecture:** BERT (Bidirectional Encoder Representations from Transformers) |
|
|
**Version:** 1.0 |
|
|
**Date:** 6/25/2025 |
|
|
|
|
|
### Model Description |
|
|
GlycoBERT is a transformer-based deep learning model designed to predict glycan structures from tandem mass spectrometry (MS/MS) data. The model treats mass spectra as tokenized sequences ("MS sentences") and performs multi-class classification to assign spectra to one of 3,590 possible glycan structure classes. |
|
|
|
|
|
### Architecture Details |
|
|
- **Base Architecture:** BertForSequenceClassification |
|
|
- **Parameters:** 96 million |
|
|
- **Layers:** 12 transformer layers |
|
|
- **Attention Heads:** 12 per layer |
|
|
- **Hidden Size:** 768 dimensions |
|
|
- **Max Sequence Length:** 512 tokens |
|
|
- **Vocabulary Size:** 10,010 tokens |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
- Glycan structure prediction from MS/MS spectra |
|
|
- High-throughput glycomics analysis |
|
|
- Structural annotation of mass spectrometry data |
|
|
- Research applications in glycobiology and glycoinformatics |
|
|
|
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
### Responsible Use |
|
|
- Model predictions should be validated experimentally |
|
|
- Not intended for direct clinical decision-making without proper validation |
|
|
- Users should understand the model's limitations and scope |
|
|
|
|
|
### Potential Risks |
|
|
- Over-reliance on computational predictions without experimental validation |
|
|
- Misinterpretation of confidence scores as absolute certainty |
|
|
- Application to data significantly different from training distribution |
|
|
|
|
|
## Model Version |
|
|
- **GlycoBERT-F:** Version trained on full dataset |
|
|
|
|
|
### Code and Data Availability |
|
|
- **Repository:** [GitHub (glycotrans)](https://github.com/cabsel/glycotrans) |
|
|
- **Training Data:** [](https://doi.org/10.5281/zenodo.15741423) |
|
|
- **Example Inference:** [Google Colab](https://colab.research.google.com/drive/1otVLVDQfLyldtIFcBxGnwVf9PeeTnJ17?usp=sharing) |
|
|
|
|
|
|