Update README.md
Browse files
README.md
CHANGED
|
@@ -60,13 +60,6 @@ However, since we are tokenizing the reaction SMILES on a character level,
|
|
| 60 |
the model has learnt dependencies among molecules and enzyme sequence features, and it can transfer learning from more to less populated
|
| 61 |
reaction classes.
|
| 62 |
|
| 63 |
-
## **Model Performance**
|
| 64 |
-
|
| 65 |
-
- **Dataset curation**
|
| 66 |
-
We converted the reactions from rxn format to smile string including only left-to-right reactions.
|
| 67 |
-
The enzyme sequences were truncated to 1024.
|
| 68 |
-
Enzymes catalyzing more than one reaction appear in multiple enzyme-reaction pairs.
|
| 69 |
-
|
| 70 |
|
| 71 |
|
| 72 |
## **How to generate from REXzyme**
|
|
@@ -80,6 +73,10 @@ Usually the BLEU score is deployed for translation evaluation,
|
|
| 80 |
but this score would enforce a high sequence similarity (thus not *de novo* design, which is what we tend to go for).
|
| 81 |
We recommend generating many sequences and selecting them by plDDT, as well as other metrics.
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
```python
|
| 84 |
"""Inference on a SMILES txt. Saved as fastas
|
| 85 |
Previously called generate_comparison"""
|
|
@@ -91,7 +88,7 @@ if __name__ == '__main__':
|
|
| 91 |
import torch
|
| 92 |
import json
|
| 93 |
|
| 94 |
-
parser = argparse.ArgumentParser(description='
|
| 95 |
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
|
| 96 |
parser.add_argument('--input_file', default='../inference/random_smiles2.txt', type=str,
|
| 97 |
help='File with the input molecule SMILES')
|
|
|
|
| 60 |
the model has learnt dependencies among molecules and enzyme sequence features, and it can transfer learning from more to less populated
|
| 61 |
reaction classes.
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
|
| 65 |
## **How to generate from REXzyme**
|
|
|
|
| 73 |
but this score would enforce a high sequence similarity (thus not *de novo* design, which is what we tend to go for).
|
| 74 |
We recommend generating many sequences and selecting them by plDDT, as well as other metrics.
|
| 75 |
|
| 76 |
+
Before running the inference script, one should create a text file containing the desired input SMILE. Note that if there are multiple reactions SMILE in the same file
|
| 77 |
+
but in separate lines, the model will generate sequences for each reaction independently, creating different a different output file for each of them.
|
| 78 |
+
|
| 79 |
+
|
| 80 |
```python
|
| 81 |
"""Inference on a SMILES txt. Saved as fastas
|
| 82 |
Previously called generate_comparison"""
|
|
|
|
| 88 |
import torch
|
| 89 |
import json
|
| 90 |
|
| 91 |
+
parser = argparse.ArgumentParser(description='inference',
|
| 92 |
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
|
| 93 |
parser.add_argument('--input_file', default='../inference/random_smiles2.txt', type=str,
|
| 94 |
help='File with the input molecule SMILES')
|