AI4PD
/

REXzyme

text2text-generation

text-generation-inference

Model card Files Files and versions

nuriamimbreropelegri commited on Jun 16, 2025

Commit

d4218c2

·

verified ·

1 Parent(s): 8b5718b

Update README.md

Files changed (1) hide show

README.md +5 -8

README.md CHANGED Viewed

@@ -60,13 +60,6 @@ However, since we are tokenizing the reaction SMILES on a character level,
 the model has learnt dependencies among molecules and enzyme sequence features, and it can transfer learning from more to less populated
 reaction classes.
-## **Model Performance**
-- **Dataset curation**
-We converted the reactions from rxn format to smile string including only left-to-right reactions.
-The enzyme sequences were truncated to 1024.
-Enzymes catalyzing more than one reaction appear in multiple enzyme-reaction pairs.
 ## **How to generate from REXzyme**
@@ -80,6 +73,10 @@ Usually the BLEU score is deployed for translation evaluation,
 but this score would enforce a high sequence similarity (thus not *de novo* design, which is what we tend to go for).
 We recommend generating many sequences and selecting them by plDDT, as well as other metrics.
 ```python
 """Inference on a SMILES txt. Saved as fastas
 Previously called generate_comparison"""
@@ -91,7 +88,7 @@ if __name__ == '__main__':
     import torch
     import json
-    parser = argparse.ArgumentParser(description='Mol2Pro inference',
                                          formatter_class=argparse.ArgumentDefaultsHelpFormatter)
     parser.add_argument('--input_file', default='../inference/random_smiles2.txt', type=str,
                         help='File with the input molecule SMILES')

 the model has learnt dependencies among molecules and enzyme sequence features, and it can transfer learning from more to less populated
 reaction classes.
 ## **How to generate from REXzyme**
 but this score would enforce a high sequence similarity (thus not *de novo* design, which is what we tend to go for).
 We recommend generating many sequences and selecting them by plDDT, as well as other metrics.
+Before running the inference script, one should create a text file containing the desired input SMILE. Note that if there are multiple reactions SMILE in the same file
+but in separate lines, the model will generate sequences for each reaction independently, creating different a different output file for each of them.
 ```python
 """Inference on a SMILES txt. Saved as fastas
 Previously called generate_comparison"""
     import torch
     import json
+    parser = argparse.ArgumentParser(description='inference',
                                          formatter_class=argparse.ArgumentDefaultsHelpFormatter)
     parser.add_argument('--input_file', default='../inference/random_smiles2.txt', type=str,
                         help='File with the input molecule SMILES')