--- license: apache-2.0 tags: - glycan - glycomics - glycoinformatics - mass_spectrometry --- ## Model Details **Model Name:** GlycoBERT **Model Type:** Transformer-based sequence classifier for glycan structure prediction **Architecture:** BERT (Bidirectional Encoder Representations from Transformers) **Version:** 1.0 **Date:** 6/25/2025 ### Model Description GlycoBERT is a transformer-based deep learning model designed to predict glycan structures from tandem mass spectrometry (MS/MS) data. The model treats mass spectra as tokenized sequences ("MS sentences") and performs multi-class classification to assign spectra to one of 3,590 possible glycan structure classes. ### Architecture Details - **Base Architecture:** BertForSequenceClassification - **Parameters:** 96 million - **Layers:** 12 transformer layers - **Attention Heads:** 12 per layer - **Hidden Size:** 768 dimensions - **Max Sequence Length:** 512 tokens - **Vocabulary Size:** 10,010 tokens ## Intended Use ### Primary Use Cases - Glycan structure prediction from MS/MS spectra - High-throughput glycomics analysis - Structural annotation of mass spectrometry data - Research applications in glycobiology and glycoinformatics ## Ethical Considerations ### Responsible Use - Model predictions should be validated experimentally - Not intended for direct clinical decision-making without proper validation - Users should understand the model's limitations and scope ### Potential Risks - Over-reliance on computational predictions without experimental validation - Misinterpretation of confidence scores as absolute certainty - Application to data significantly different from training distribution ## Model Version - **GlycoBERT-F:** Version trained on full dataset ### Code and Data Availability - **Repository:** [GitHub (glycotrans)](https://github.com/cabsel/glycotrans) - **Training Data:** [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15741423.svg)](https://doi.org/10.5281/zenodo.15741423) - **Example Inference:** [Google Colab](https://colab.research.google.com/drive/1otVLVDQfLyldtIFcBxGnwVf9PeeTnJ17?usp=sharing)