Flansma
/

helm-bert

peptide-language-model

Model card Files Files and versions

helm-bert / README.md

Flansma's picture

Update README

2aa1469 verified 8 days ago

|

history blame contribute delete

2.31 kB

	---
	license: mit
	language: en
	tags:
	- peptide
	- biology
	- drug-discovery
	- HELM
	- helm-notation
	- cyclic-peptide
	- peptide-language-model
	pipeline_tag: fill-mask
	widget:
	- text: "PEPTIDE1{[Abu].[Sar].[meL].V.[meL].A.[dA].[meL].[meL].[meV].[Me_Bmt(E)]}$PEPTIDE1,PEPTIDE1,1:R1-11:R2$$$"
	---

	# HELM-BERT

	A language model for peptide representation learning using HELM (Hierarchical Editing Language for Macromolecules) notation.

	[![GitHub](https://img.shields.io/badge/GitHub-clinfo%2FHELM--BERT-black?logo=github)](https://github.com/clinfo/HELM-BERT)

	## Model Description

	HELM-BERT is built upon the DeBERTa architecture, designed for peptide sequences in HELM notation:

	- Disentangled Attention: Decomposes attention into content-content and content-position terms
	- Enhanced Mask Decoder (EMD): Injects absolute position embeddings at the decoder stage
	- Span Masking: Contiguous token masking with geometric distribution
	- nGiE: n-gram Induced Encoding layer (1D convolution, kernel size 3)

	## Model Specifications

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Parameters \| 54.8M \|
	\| Hidden size \| 768 \|
	\| Layers \| 6 \|
	\| Attention heads \| 12 \|
	\| Vocab size \| 78 \|
	\| Max token length \| 512 \|

	## How to Use

	```python
	from transformers import AutoModel, AutoTokenizer

	model = AutoModel.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained("Flansma/helm-bert", trust_remote_code=True)

	# Cyclosporine A
	inputs = tokenizer("PEPTIDE1{[Abu].[Sar].[meL].V.[meL].A.[dA].[meL].[meL].[meV].[Me_Bmt(E)]}$PEPTIDE1,PEPTIDE1,1:R1-11:R2$$$", return_tensors="pt")
	outputs = model(**inputs)
	embeddings = outputs.last_hidden_state
	```

	## Training Data

	Pretrained on deduplicated peptide sequences from:
	- ChEMBL: Bioactive molecules database
	- CycPeptMPDB: Cyclic peptide membrane permeability database
	- Propedia: Protein-peptide interaction database

	## Citation

	```bibtex
	@article{lee2025helmbert,
	title={HELM-BERT: A Transformer for Medium-sized Peptide Property Prediction},
	author={Seungeon Lee and Takuto Koyama and Itsuki Maeda and Shigeyuki Matsumoto and Yasushi Okuno},
	journal={arXiv preprint arXiv:2512.23175},
	year={2025},
	url={https://arxiv.org/abs/2512.23175}
	}
	```

	## License

	MIT License