Flansma commited on
Commit
96bf844
·
verified ·
1 Parent(s): dfc7876

Update architecture description and Cyclosporine A example

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  - peptide-language-model
12
  pipeline_tag: fill-mask
13
  widget:
14
- - text: "PEPTIDE1{A.C.D.E.F}$$$$"
15
  ---
16
 
17
  # HELM-BERT
@@ -20,12 +20,12 @@ A language model for peptide representation learning using **HELM (Hierarchical
20
 
21
  ## Model Description
22
 
23
- HELM-BERT is a BERT-style encoder designed specifically for peptide sequences in HELM notation. It incorporates several architectural innovations:
24
 
25
- - **Disentangled Attention**: Separate content and position representations (DeBERTa-style)
26
- - **Enhanced Mask Decoder (EMD)**: Absolute position encoding for MLM pretraining
27
- - **Span Masking**: Contiguous token masking for improved contextual learning
28
- - **nGiE**: n-gram Induced Encoding layer for local pattern recognition
29
 
30
  Please check the [official repository](https://github.com/clinfo/HELM-BERT) for more implementation details and updates.
31
 
@@ -48,7 +48,8 @@ from transformers import AutoModel, AutoTokenizer
48
  model = AutoModel.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
49
  tokenizer = AutoTokenizer.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
50
 
51
- inputs = tokenizer("PEPTIDE1{A.C.D.E.F}$$$$", return_tensors="pt")
 
52
  outputs = model(**inputs)
53
  embeddings = outputs.last_hidden_state
54
  ```
 
11
  - peptide-language-model
12
  pipeline_tag: fill-mask
13
  widget:
14
+ - text: "PEPTIDE1{[Abu].[Sar].[meL].V.[meL].A.[dA].[meL].[meL].[meV].[Me_Bmt(E)]}$PEPTIDE1,PEPTIDE1,1:R1-11:R2$$$"
15
  ---
16
 
17
  # HELM-BERT
 
20
 
21
  ## Model Description
22
 
23
+ HELM-BERT is built upon the DeBERTa architecture, designed for peptide sequences in HELM notation:
24
 
25
+ - **Disentangled Attention**: Decomposes attention into content-content and content-position terms
26
+ - **Enhanced Mask Decoder (EMD)**: Injects absolute position embeddings at the decoder stage
27
+ - **Span Masking**: Contiguous token masking with geometric distribution
28
+ - **nGiE**: n-gram Induced Encoding layer (1D convolution, kernel size 3)
29
 
30
  Please check the [official repository](https://github.com/clinfo/HELM-BERT) for more implementation details and updates.
31
 
 
48
  model = AutoModel.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
49
  tokenizer = AutoTokenizer.from_pretrained("Flansma/helm-bert", trust_remote_code=True)
50
 
51
+ # Cyclosporine A
52
+ inputs = tokenizer("PEPTIDE1{[Abu].[Sar].[meL].V.[meL].A.[dA].[meL].[meL].[meV].[Me_Bmt(E)]}$PEPTIDE1,PEPTIDE1,1:R1-11:R2$$$", return_tensors="pt")
53
  outputs = model(**inputs)
54
  embeddings = outputs.last_hidden_state
55
  ```