wabu
/

AmpGPT2

Safetensors

gpt2

Generated from Trainer

Model card Files Files and versions

xet

Community

wabu commited on Nov 18, 2024

Commit

f8c84db

verified ·

1 Parent(s): 428c239

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -12

README.md CHANGED Viewed

@@ -18,9 +18,11 @@ AmpGPT2 is a language model capable of generating de novo antimicrobial peptides
 AmpGPT2 is a fine-tuned version of [nferruz/ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) based on the GPT2 Transformer architecture.
-To validate the results the Antimicrobial Peptide Scanner vr.2 (https://www.dveltri.com/ascan/v2/ascan.html) was used.
-It is a deep learning tool specifically designed for AMP recognition.
 ## Training and evaluation data
@@ -30,7 +32,7 @@ AmpGPT2 was trained using 32014 AMP sequences from the Compass (https://compass.
 The example code below contains the ideal generation settings found while testing.
 The 'num_return_sequences' parameter specifies the amount of sequences generated. When generating more than 100 sequences at the same time, I recommend doing it in batches.
-The results can then be checked with the peptide scanner (https://www.dveltri.com/ascan/v2/ascan.html).
 ```
 from transformers import pipeline
 from transformers import GPT2LMHeadModel, GPT2Tokenizer
@@ -49,7 +51,7 @@ for i, seq in enumerate(amp_sequences):
     print(f">{sequence_identifier}\n{sequence}")
 ```
-### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
@@ -60,14 +62,24 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 50.0
-The model was trained on four NVIDIA A100 GPUs.
-### Training results
-| Training Loss | Epoch | Validation Loss | Accuracy |
-|:-------------:|:-----:|:---------------:|:--------:|
-| 3.7948        | 50.0  | 3.9890          | 0.4213   |
 ### Framework versions
@@ -75,3 +87,5 @@ The model was trained on four NVIDIA A100 GPUs.
 - Pytorch 2.2.0+cu121
 - Datasets 2.16.1
 - Tokenizers 0.15.0

 AmpGPT2 is a fine-tuned version of [nferruz/ProtGPT2](https://huggingface.co/nferruz/ProtGPT2) based on the GPT2 Transformer architecture.
+| Training Loss | Epoch | Validation Loss | Accuracy |
+|:-------------:|:-----:|:---------------:|:--------:|
+| 3.7948        | 50.0  | 3.9890          | 0.4213   |
+To validate the results the Antimicrobial Peptide Scanner vr.2 (https://www.dveltri.com/ascan/v2/ascan.html) was used, which is a deep learning tool specifically designed for AMP recognition.
 ## Training and evaluation data
 The example code below contains the ideal generation settings found while testing.
 The 'num_return_sequences' parameter specifies the amount of sequences generated. When generating more than 100 sequences at the same time, I recommend doing it in batches.
+The results can then be checked with the peptide scanner.
 ```
 from transformers import pipeline
 from transformers import GPT2LMHeadModel, GPT2Tokenizer
     print(f">{sequence_identifier}\n{sequence}")
 ```
+### Training hyperparameters and results
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
 - lr_scheduler_type: linear
 - num_epochs: 50.0
+\begin{table}[h!]
+    \centering
+    \caption{AMP Yield Comparison between AmpGPT2 and ProtGPT2}
+    \begin{tabular}{lccc}
+        \toprule
+        Model & Total Sequences & AMP Classified & AMP Percentage (AMP\%) \\
+        \midrule
+        AmpGPT2 & 10000 & 9541 & 95.41\% \\
+        ProtGPT2 & 10000 & 5530 & 55.3\% \\
+        \bottomrule
+    \end{tabular}
+    \label{tab:amp_yield}
+\end{table}
+| Model | Amp% | Length |
+|:-------:|:-----:|:-------:|
+|AmpGPT2|95.86|64.08   |
+|ProtGPT2| 51.85 | 222.59 |
 ### Framework versions
 - Pytorch 2.2.0+cu121
 - Datasets 2.16.1
 - Tokenizers 0.15.0
+The model was trained on four NVIDIA A100 GPUs.