Model Details

Model Name: GlycoBERT
Model Type: Transformer-based sequence classifier for glycan structure prediction
Architecture: BERT (Bidirectional Encoder Representations from Transformers)
Version: 1.0
Date: 6/25/2025

Model Description

GlycoBERT is a transformer-based deep learning model designed to predict glycan structures from tandem mass spectrometry (MS/MS) data. The model treats mass spectra as tokenized sequences ("MS sentences") and performs multi-class classification to assign spectra to one of 3,590 possible glycan structure classes.

Architecture Details

Base Architecture: BertForSequenceClassification
Parameters: 96 million
Layers: 12 transformer layers
Attention Heads: 12 per layer
Hidden Size: 768 dimensions
Max Sequence Length: 512 tokens
Vocabulary Size: 10,010 tokens

Intended Use

Primary Use Cases

Glycan structure prediction from MS/MS spectra
High-throughput glycomics analysis
Structural annotation of mass spectrometry data
Research applications in glycobiology and glycoinformatics

Ethical Considerations

Responsible Use

Model predictions should be validated experimentally
Not intended for direct clinical decision-making without proper validation
Users should understand the model's limitations and scope

Potential Risks

Over-reliance on computational predictions without experimental validation
Misinterpretation of confidence scores as absolute certainty
Application to data significantly different from training distribution

Model Version

GlycoBERT-F: Version trained on full dataset

Code and Data Availability

Repository: GitHub (glycotrans)
Training Data:
Example Inference: Google Colab

Downloads last month: 31

Safetensors

Model size

96.5M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support