nferruz commited on
Commit
a6cea68
·
1 Parent(s): 5993d03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -83,8 +83,6 @@ REXzyme was pre-trained with a supervised translation objective i.e., the model
83
 
84
  There are stark differences in the number of members among Reaction classes, and for this reason. Since we are tokenizing the reaction smiles on a char level, classes with few reactions can profit from the knwodledge gained for classes catalyzing similar reactions that have a lot of members.
85
 
86
- The figure below summarizes the process of training: (add figure) [STILL MISSING!]
87
-
88
 
89
  ## **Model Performance**
90
 
@@ -96,12 +94,11 @@ We converted the reactions from rxn format to smile string including only left-t
96
  | Method | Natural | Generated |
97
  | :--- | :----: | ---: |
98
  | **IUPRED3 (ordered)** | 99.9% | 99.9% |
99
- | **ESMFold** | 85.03 | 71.59 (selected: 79.82) |
100
- | **FlDPnn** | missing | 0.0929 |
101
- | **PSIpred** | missing | missing |
102
  <br/><br/>
103
 
104
- - **PGP pipeline**
105
 
106
  | Method | Natural | Generated |
107
  | :--- | :---- | :--- |
@@ -119,16 +116,20 @@ We converted the reactions from rxn format to smile string including only left-t
119
  | Syntax | Identity | Alignment length |
120
  | :--- | :----: | ---: |
121
  | **Generated** | 74.29% | 406.0 |
122
- | **Selection (<70%)**| 57.20% | 338.1 |
123
  <br/><br/>
 
124
 
125
  ## **How to generate from REXzyme**
126
  REXzyme can be used with the HuggingFace transformer python package.
127
- Detailed installation instructions can be found [here](https://huggingface.co/docs/transformers/installation)
128
 
129
  Since REXzyme has been trained on the objective of machine translation, users have to specify a chemical reaction, specified in the format of SMILES.
130
 
131
- Disclaimer: Although the perplexity gets computed here it is not the best selection criteria. Usually the BLEU score is deployed for translation evaluation, but this score would enforce a high sequence similarity thus not *de novo* design. We recommend generating many sequences and selecting them by plDDT as well as low identity.
 
 
 
132
 
133
  ```python
134
  from datasets import load_from_disk
 
83
 
84
  There are stark differences in the number of members among Reaction classes, and for this reason. Since we are tokenizing the reaction smiles on a char level, classes with few reactions can profit from the knwodledge gained for classes catalyzing similar reactions that have a lot of members.
85
 
 
 
86
 
87
  ## **Model Performance**
88
 
 
94
  | Method | Natural | Generated |
95
  | :--- | :----: | ---: |
96
  | **IUPRED3 (ordered)** | 99.9% | 99.9% |
97
+ | **ESMFold (avg. plddt)** | 85.03 | 71.59 (selected: 79.82) |
98
+ | **FlDPnn** | wip | 0.0929 |
 
99
  <br/><br/>
100
 
101
+ - **PGP pipeline** [(see GitHub)](https://github.com/hefeda/PGP)
102
 
103
  | Method | Natural | Generated |
104
  | :--- | :---- | :--- |
 
116
  | Syntax | Identity | Alignment length |
117
  | :--- | :----: | ---: |
118
  | **Generated** | 74.29% | 406.0 |
119
+ | **Selection (<70%)**<sup>[1]|</sup> 57.20% | 338.1 |
120
  <br/><br/>
121
+ <sup>[1]|</sup> We excluded sequences ≥ 70%
122
 
123
  ## **How to generate from REXzyme**
124
  REXzyme can be used with the HuggingFace transformer python package.
125
+ Detailed installation instructions can be found [here](https://huggingface.co/docs/transformers/installation).
126
 
127
  Since REXzyme has been trained on the objective of machine translation, users have to specify a chemical reaction, specified in the format of SMILES.
128
 
129
+ Disclaimer: Although the perplexity gets computed here it is not the best selection criteria.
130
+ Usually the BLEU score is deployed for translation evaluation,
131
+ but this score would enforce a high sequence similarity (thus not *de novo* design, which is what we tend to go for).
132
+ We recommend generating many sequences and selecting them by plDDT, as well as other metrics.
133
 
134
  ```python
135
  from datasets import load_from_disk