Update README.md
Browse files
README.md
CHANGED
|
@@ -83,8 +83,6 @@ REXzyme was pre-trained with a supervised translation objective i.e., the model
|
|
| 83 |
|
| 84 |
There are stark differences in the number of members among Reaction classes, and for this reason. Since we are tokenizing the reaction smiles on a char level, classes with few reactions can profit from the knwodledge gained for classes catalyzing similar reactions that have a lot of members.
|
| 85 |
|
| 86 |
-
The figure below summarizes the process of training: (add figure) [STILL MISSING!]
|
| 87 |
-
|
| 88 |
|
| 89 |
## **Model Performance**
|
| 90 |
|
|
@@ -96,12 +94,11 @@ We converted the reactions from rxn format to smile string including only left-t
|
|
| 96 |
| Method | Natural | Generated |
|
| 97 |
| :--- | :----: | ---: |
|
| 98 |
| **IUPRED3 (ordered)** | 99.9% | 99.9% |
|
| 99 |
-
| **ESMFold** | 85.03 | 71.59 (selected: 79.82) |
|
| 100 |
-
| **FlDPnn** |
|
| 101 |
-
| **PSIpred** | missing | missing |
|
| 102 |
<br/><br/>
|
| 103 |
|
| 104 |
-
- **PGP pipeline**
|
| 105 |
|
| 106 |
| Method | Natural | Generated |
|
| 107 |
| :--- | :---- | :--- |
|
|
@@ -119,16 +116,20 @@ We converted the reactions from rxn format to smile string including only left-t
|
|
| 119 |
| Syntax | Identity | Alignment length |
|
| 120 |
| :--- | :----: | ---: |
|
| 121 |
| **Generated** | 74.29% | 406.0 |
|
| 122 |
-
| **Selection (<70%)
|
| 123 |
<br/><br/>
|
|
|
|
| 124 |
|
| 125 |
## **How to generate from REXzyme**
|
| 126 |
REXzyme can be used with the HuggingFace transformer python package.
|
| 127 |
-
Detailed installation instructions can be found [here](https://huggingface.co/docs/transformers/installation)
|
| 128 |
|
| 129 |
Since REXzyme has been trained on the objective of machine translation, users have to specify a chemical reaction, specified in the format of SMILES.
|
| 130 |
|
| 131 |
-
Disclaimer: Although the perplexity gets computed here it is not the best selection criteria.
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
```python
|
| 134 |
from datasets import load_from_disk
|
|
|
|
| 83 |
|
| 84 |
There are stark differences in the number of members among Reaction classes, and for this reason. Since we are tokenizing the reaction smiles on a char level, classes with few reactions can profit from the knwodledge gained for classes catalyzing similar reactions that have a lot of members.
|
| 85 |
|
|
|
|
|
|
|
| 86 |
|
| 87 |
## **Model Performance**
|
| 88 |
|
|
|
|
| 94 |
| Method | Natural | Generated |
|
| 95 |
| :--- | :----: | ---: |
|
| 96 |
| **IUPRED3 (ordered)** | 99.9% | 99.9% |
|
| 97 |
+
| **ESMFold (avg. plddt)** | 85.03 | 71.59 (selected: 79.82) |
|
| 98 |
+
| **FlDPnn** | wip | 0.0929 |
|
|
|
|
| 99 |
<br/><br/>
|
| 100 |
|
| 101 |
+
- **PGP pipeline** [(see GitHub)](https://github.com/hefeda/PGP)
|
| 102 |
|
| 103 |
| Method | Natural | Generated |
|
| 104 |
| :--- | :---- | :--- |
|
|
|
|
| 116 |
| Syntax | Identity | Alignment length |
|
| 117 |
| :--- | :----: | ---: |
|
| 118 |
| **Generated** | 74.29% | 406.0 |
|
| 119 |
+
| **Selection (<70%)**<sup>[1]|</sup> 57.20% | 338.1 |
|
| 120 |
<br/><br/>
|
| 121 |
+
<sup>[1]|</sup> We excluded sequences ≥ 70%
|
| 122 |
|
| 123 |
## **How to generate from REXzyme**
|
| 124 |
REXzyme can be used with the HuggingFace transformer python package.
|
| 125 |
+
Detailed installation instructions can be found [here](https://huggingface.co/docs/transformers/installation).
|
| 126 |
|
| 127 |
Since REXzyme has been trained on the objective of machine translation, users have to specify a chemical reaction, specified in the format of SMILES.
|
| 128 |
|
| 129 |
+
Disclaimer: Although the perplexity gets computed here it is not the best selection criteria.
|
| 130 |
+
Usually the BLEU score is deployed for translation evaluation,
|
| 131 |
+
but this score would enforce a high sequence similarity (thus not *de novo* design, which is what we tend to go for).
|
| 132 |
+
We recommend generating many sequences and selecting them by plDDT, as well as other metrics.
|
| 133 |
|
| 134 |
```python
|
| 135 |
from datasets import load_from_disk
|