Playingyoyo commited on
Commit
51637aa
·
verified ·
1 Parent(s): 8e4a08c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -10,13 +10,14 @@ inference:
10
  ---
11
 
12
  # **GPepT: A Language Model for Peptides and Peptidomimetics**
 
13
 
14
  GPepT is a cutting-edge language model designed to understand and generate sequences in the specialized domain of peptides and peptidomimetics. It serves as a powerful tool for _de novo_ protein design and engineering. As demonstrated in our research, the incorporation of peptidomimetics significantly broadens the chemical space accessible through generated sequences, enabling innovative approaches to peptide-based therapeutics.
15
 
16
  ## **Model Overview**
17
  GPepT builds upon the GPT-2 Transformer architecture, comprising 36 layers and a model dimensionality of 1280, with a total of 738 million parameters. This decoder-only model has been pre-trained on a curated dataset of peptides and peptidomimetics mined from bioactivity-labeled chemical formulas in ChEMBL.
18
 
19
- To leverage GPepT’s pre-trained weights, input molecules must be converted into a standardized sequence-like representation of peptidomimetics using **Monomerizer** (available on GitHub). Detailed insights into the training process and datasets are provided in our accompanying publication.
20
 
21
  Unlike traditional protein design models, GPepT is trained in a self-supervised manner, using raw sequence data without explicit annotation. This design enables the model to generalize across diverse sequence spaces, producing functional antimicrobial peptidomimetics upon fine-tuning.
22
 
 
10
  ---
11
 
12
  # **GPepT: A Language Model for Peptides and Peptidomimetics**
13
+ ![alt text](TOC.png)
14
 
15
  GPepT is a cutting-edge language model designed to understand and generate sequences in the specialized domain of peptides and peptidomimetics. It serves as a powerful tool for _de novo_ protein design and engineering. As demonstrated in our research, the incorporation of peptidomimetics significantly broadens the chemical space accessible through generated sequences, enabling innovative approaches to peptide-based therapeutics.
16
 
17
  ## **Model Overview**
18
  GPepT builds upon the GPT-2 Transformer architecture, comprising 36 layers and a model dimensionality of 1280, with a total of 738 million parameters. This decoder-only model has been pre-trained on a curated dataset of peptides and peptidomimetics mined from bioactivity-labeled chemical formulas in ChEMBL.
19
 
20
+ To leverage GPepT’s pre-trained weights, input molecules must be converted into a standardized sequence-like representation of peptidomimetics using [**Monomerizer**](https://github.com/tsudalab/Monomerizer/tree/main). Detailed insights into the training process and datasets are provided in our accompanying publication.
21
 
22
  Unlike traditional protein design models, GPepT is trained in a self-supervised manner, using raw sequence data without explicit annotation. This design enables the model to generalize across diverse sequence spaces, producing functional antimicrobial peptidomimetics upon fine-tuning.
23