Flansma commited on
Commit
dea0260
·
verified ·
1 Parent(s): 5e5108d

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language: en
4
+ tags:
5
+ - peptide
6
+ - biology
7
+ - drug-discovery
8
+ - HELM
9
+ - helm-notation
10
+ - cyclic-peptide
11
+ - peptide-language-model
12
+ pipeline_tag: fill-mask
13
+ widget:
14
+ - text: "PEPTIDE1{A.C.D.E.F}$$$$"
15
+ ---
16
+
17
+ # HELM-BERT
18
+
19
+ A language model for peptide representation learning using **HELM (Hierarchical Editing Language for Macromolecules)** notation.
20
+
21
+ ## Model Description
22
+
23
+ HELM-BERT is a BERT-style encoder designed specifically for peptide sequences in HELM notation. It incorporates several architectural innovations:
24
+
25
+ - **Disentangled Attention**: Separate content and position representations (DeBERTa-style)
26
+ - **Enhanced Mask Decoder (EMD)**: Absolute position encoding for MLM pretraining
27
+ - **Span Masking**: Contiguous token masking for improved contextual learning
28
+ - **nGiE**: n-gram Induced Encoding layer for local pattern recognition
29
+
30
+ ## How to Use
31
+
32
+ ```python
33
+ from transformers import AutoModel, AutoTokenizer
34
+
35
+ model = AutoModel.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
36
+ tokenizer = AutoTokenizer.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
37
+
38
+ inputs = tokenizer("PEPTIDE1{A.C.D.E.F}$$$$", return_tensors="pt")
39
+ outputs = model(**inputs)
40
+ embeddings = outputs.last_hidden_state
41
+ ```
42
+
43
+ ## Training Data
44
+
45
+ Pretrained on deduplicated peptide sequences from:
46
+ - **ChEMBL**: Bioactive molecules database
47
+ - **CycPeptMPDB**: Cyclic peptide membrane permeability database
48
+ - **Propedia**: Protein-peptide interaction database
49
+
50
+ ## Citation
51
+
52
+ ```bibtex
53
+ @misc{helm-bert,
54
+ title={HELM-BERT: A Transformer for Medium-sized Peptide Property Prediction},
55
+ author={Seungeon Lee},
56
+ year={2025},
57
+ url={https://huggingface.co/Flansma/helm-bert}
58
+ }
59
+ ```
60
+
61
+ ## License
62
+
63
+ MIT License