ribesstefano commited on
Commit
a1edd6a
·
verified ·
1 Parent(s): 3b05f89

Update protac_splitter_app.py

Browse files
Files changed (1) hide show
  1. protac_splitter_app.py +5 -5
protac_splitter_app.py CHANGED
@@ -252,9 +252,9 @@ You can choose which model to use for splitting PROTAC molecules:
252
  - If both are selected, the Transformer model will be used first, then if it fails, the XGBoost model will be used.
253
  - If no model is selected, splitting will be done using graph-based heuristics, with no AI model involved.
254
 
255
- For fast splitting, we reccommend using the XGBoost model only, which is fast and efficient for most cases.
256
 
257
- The Transformer model runs on CPU, so it is slower, especially for processing large CSV files.
258
  """)
259
  with gr.Row():
260
  with gr.Column(scale=2):
@@ -277,9 +277,9 @@ For single SMILES processing, the default values should work well in most cases.
277
  label="Number of Processes",
278
  value=2,
279
  minimum=1,
280
- maximum=8,
281
  step=1,
282
- info="Number of processes to use for parallel processing. Higher values may improve performance but require more memory."
283
  )
284
 
285
  # Add a number input for beam_size if Transformer model is selected
@@ -323,7 +323,7 @@ For single SMILES processing, the default values should work well in most cases.
323
  # ----------------------------------------------------------------------
324
  gr.Markdown("""## Specify Inputs
325
 
326
- **Disclaimer**: The input SMILES is checked for validity before processing. There is no check on whether the SMILES is a PROTAC-like molecule or not.
327
  For example, attempting to split the SMILES `c1ccccc` (benzene) with the XGBoost or heuristic strategies will return an error, as ring bonds are ignored for splitting.
328
  On the other end, `c1ccccc1CCC1CCCC1` will return a plausible split, even though it is not a PROTAC molecule.
329
  """)
 
252
  - If both are selected, the Transformer model will be used first, then if it fails, the XGBoost model will be used.
253
  - If no model is selected, splitting will be done using graph-based heuristics, with no AI model involved.
254
 
255
+ For fast splitting, heuristic and XGBoost models are fast and efficient for most cases. On the other hand, the Transformer model runs on CPU, so it is slower, especially for processing large CSV files.
256
 
257
+ For choosing the right model to split large datasets (in the `Upload CSV` tab), we reccommend to first testing out _all_ the available models (heuristic, XGBoost, and Transformer) on a few PROTACs in the `Single SMILES Input` tab and check the quality of the splits.
258
  """)
259
  with gr.Row():
260
  with gr.Column(scale=2):
 
277
  label="Number of Processes",
278
  value=2,
279
  minimum=1,
280
+ maximum=2,
281
  step=1,
282
+ info="Number of processes to use for parallel processing. Higher values may improve performance but require more memory. (Capped to 2 in this HF Space)"
283
  )
284
 
285
  # Add a number input for beam_size if Transformer model is selected
 
323
  # ----------------------------------------------------------------------
324
  gr.Markdown("""## Specify Inputs
325
 
326
+ **Disclaimer**: The input SMILES is checked for validity before processing. However, there is no check on whether the SMILES is a PROTAC-like molecule or not.
327
  For example, attempting to split the SMILES `c1ccccc` (benzene) with the XGBoost or heuristic strategies will return an error, as ring bonds are ignored for splitting.
328
  On the other end, `c1ccccc1CCC1CCCC1` will return a plausible split, even though it is not a PROTAC molecule.
329
  """)