Flansma
/

helm-bert

+---
+license: mit
+language: en
+tags:
+  - peptide
+  - biology
+  - drug-discovery
+  - HELM
+  - helm-notation
+  - cyclic-peptide
+  - peptide-language-model
+pipeline_tag: fill-mask
+widget:
+  - text: "PEPTIDE1{A.C.D.E.F}$$$$"
+---
+# HELM-BERT
+A language model for peptide representation learning using **HELM (Hierarchical Editing Language for Macromolecules)** notation.
+## Model Description
+HELM-BERT is a BERT-style encoder designed specifically for peptide sequences in HELM notation. It incorporates several architectural innovations:
+- **Disentangled Attention**: Separate content and position representations (DeBERTa-style)
+- **Enhanced Mask Decoder (EMD)**: Absolute position encoding for MLM pretraining
+- **Span Masking**: Contiguous token masking for improved contextual learning
+- **nGiE**: n-gram Induced Encoding layer for local pattern recognition
+## How to Use
+```python
+from transformers import AutoModel, AutoTokenizer
+model = AutoModel.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
+inputs = tokenizer("PEPTIDE1{A.C.D.E.F}$$$$", return_tensors="pt")
+outputs = model(**inputs)
+embeddings = outputs.last_hidden_state
+```
+## Training Data
+Pretrained on deduplicated peptide sequences from:
+- **ChEMBL**: Bioactive molecules database
+- **CycPeptMPDB**: Cyclic peptide membrane permeability database
+- **Propedia**: Protein-peptide interaction database
+## Citation
+```bibtex
+@misc{helm-bert,
+  title={HELM-BERT: A Transformer for Medium-sized Peptide Property Prediction},
+  author={Seungeon Lee},
+  year={2025},
+  url={https://huggingface.co/Flansma/helm-bert}
+}
+```
+## License
+MIT License