glycobert / README.md
rgunawan's picture
Update README.md
0773969 verified
---
license: apache-2.0
tags:
- glycan
- glycomics
- glycoinformatics
- mass_spectrometry
---
## Model Details
**Model Name:** GlycoBERT
**Model Type:** Transformer-based sequence classifier for glycan structure prediction
**Architecture:** BERT (Bidirectional Encoder Representations from Transformers)
**Version:** 1.0
**Date:** 6/25/2025
### Model Description
GlycoBERT is a transformer-based deep learning model designed to predict glycan structures from tandem mass spectrometry (MS/MS) data. The model treats mass spectra as tokenized sequences ("MS sentences") and performs multi-class classification to assign spectra to one of 3,590 possible glycan structure classes.
### Architecture Details
- **Base Architecture:** BertForSequenceClassification
- **Parameters:** 96 million
- **Layers:** 12 transformer layers
- **Attention Heads:** 12 per layer
- **Hidden Size:** 768 dimensions
- **Max Sequence Length:** 512 tokens
- **Vocabulary Size:** 10,010 tokens
## Intended Use
### Primary Use Cases
- Glycan structure prediction from MS/MS spectra
- High-throughput glycomics analysis
- Structural annotation of mass spectrometry data
- Research applications in glycobiology and glycoinformatics
## Ethical Considerations
### Responsible Use
- Model predictions should be validated experimentally
- Not intended for direct clinical decision-making without proper validation
- Users should understand the model's limitations and scope
### Potential Risks
- Over-reliance on computational predictions without experimental validation
- Misinterpretation of confidence scores as absolute certainty
- Application to data significantly different from training distribution
## Model Version
- **GlycoBERT-F:** Version trained on full dataset
### Code and Data Availability
- **Repository:** [GitHub (glycotrans)](https://github.com/cabsel/glycotrans)
- **Training Data:** [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15741423.svg)](https://doi.org/10.5281/zenodo.15741423)
- **Example Inference:** [Google Colab](https://colab.research.google.com/drive/1otVLVDQfLyldtIFcBxGnwVf9PeeTnJ17?usp=sharing)