AI4PD
/

REXzyme

@@ -83,8 +83,6 @@ REXzyme was pre-trained with a supervised translation objective  i.e., the model
 There are stark differences in the number of members among Reaction classes, and for this reason. Since we are tokenizing the reaction smiles on a char level, classes with few reactions can profit from the knwodledge gained for classes catalyzing similar reactions that have a lot of members.
-The figure below summarizes the process of training: (add figure) [STILL MISSING!]
 ## **Model Performance**
@@ -96,12 +94,11 @@ We converted the reactions from rxn format to smile string including only left-t
     | Method                | Natural     | Generated                |
     | :---                  |    :----:   |          ---:            |
     | **IUPRED3 (ordered)** | 99.9%       | 99.9%                    |
-    | **ESMFold**           | 85.03       | 71.59 (selected: 79.82)  |
-    | **FlDPnn**            | missing     | 0.0929                  |
-    | **PSIpred**           | missing     | missing                  |
 <br/><br/>
-- **PGP pipeline**
     | Method      | Natural | Generated |
     | :---        | :----   |      :--- |
@@ -119,16 +116,20 @@ We converted the reactions from rxn format to smile string including only left-t
     | Syntax              | Identity    | Alignment length |
     | :---                |    :----:   |          ---:    |
     | **Generated**       | 74.29%      | 406.0            |
-    | **Selection (<70%)**| 57.20%      | 338.1            |
 <br/><br/>
 ## **How to generate from REXzyme**
 REXzyme can be used with the HuggingFace transformer python package.
-Detailed installation instructions can be found [here](https://huggingface.co/docs/transformers/installation)
 Since REXzyme has been trained on the objective of machine translation, users have to specify a chemical reaction, specified in the format of SMILES.
-Disclaimer: Although the perplexity gets computed here it is not the best selection criteria. Usually the BLEU score is deployed for translation evaluation, but this score would enforce a high sequence similarity thus not *de novo* design. We recommend generating many sequences and selecting them by plDDT as well as low identity.
 ```python
 from datasets import load_from_disk

 There are stark differences in the number of members among Reaction classes, and for this reason. Since we are tokenizing the reaction smiles on a char level, classes with few reactions can profit from the knwodledge gained for classes catalyzing similar reactions that have a lot of members.
 ## **Model Performance**
     | Method                | Natural     | Generated                |
     | :---                  |    :----:   |          ---:            |
     | **IUPRED3 (ordered)** | 99.9%       | 99.9%                    |
+    | **ESMFold (avg. plddt)**           | 85.03       | 71.59 (selected: 79.82)  |
+    | **FlDPnn**            | wip     | 0.0929                  |
 <br/><br/>
+- **PGP pipeline** [(see GitHub)](https://github.com/hefeda/PGP)
     | Method      | Natural | Generated |
     | :---        | :----   |      :--- |
     | Syntax              | Identity    | Alignment length |
     | :---                |    :----:   |          ---:    |
     | **Generated**       | 74.29%      | 406.0            |
+    | **Selection (<70%)**<sup>[1]|</sup> 57.20%      | 338.1            |
 <br/><br/>
+<sup>[1]|</sup> We excluded sequences ≥ 70%
 ## **How to generate from REXzyme**
 REXzyme can be used with the HuggingFace transformer python package.
+Detailed installation instructions can be found [here](https://huggingface.co/docs/transformers/installation).
 Since REXzyme has been trained on the objective of machine translation, users have to specify a chemical reaction, specified in the format of SMILES.
+Disclaimer: Although the perplexity gets computed here it is not the best selection criteria.
+Usually the BLEU score is deployed for translation evaluation,
+but this score would enforce a high sequence similarity (thus not *de novo* design, which is what we tend to go for).
+We recommend generating many sequences and selecting them by plDDT, as well as other metrics.
 ```python
 from datasets import load_from_disk