Spaces:

ailab-bio
/

PROTAC-Splitter-App

Running

App Files Files Community

ribesstefano commited on Jul 24, 2025

Commit

a1edd6a

verified ·

1 Parent(s): 3b05f89

Update protac_splitter_app.py

Browse files

Files changed (1) hide show

protac_splitter_app.py +5 -5

protac_splitter_app.py CHANGED Viewed

@@ -252,9 +252,9 @@ You can choose which model to use for splitting PROTAC molecules:
 - If both are selected, the Transformer model will be used first, then if it fails, the XGBoost model will be used.
 - If no model is selected, splitting will be done using graph-based heuristics, with no AI model involved.
-For fast splitting, we reccommend using the XGBoost model only, which is fast and efficient for most cases.
-The Transformer model runs on CPU, so it is slower, especially for processing large CSV files.
 """)
         with gr.Row():
             with gr.Column(scale=2):
@@ -277,9 +277,9 @@ For single SMILES processing, the default values should work well in most cases.
                     label="Number of Processes",
                     value=2,
                     minimum=1,
-                    maximum=8,
                     step=1,
-                    info="Number of processes to use for parallel processing. Higher values may improve performance but require more memory."
                 )
             # Add a number input for beam_size if Transformer model is selected
@@ -323,7 +323,7 @@ For single SMILES processing, the default values should work well in most cases.
         # ----------------------------------------------------------------------
         gr.Markdown("""## Specify Inputs
-**Disclaimer**: The input SMILES is checked for validity before processing. There is no check on whether the SMILES is a PROTAC-like molecule or not.
 For example, attempting to split the SMILES `c1ccccc` (benzene) with the XGBoost or heuristic strategies will return an error, as ring bonds are ignored for splitting.
 On the other end, `c1ccccc1CCC1CCCC1` will return a plausible split, even though it is not a PROTAC molecule.
 """)

 - If both are selected, the Transformer model will be used first, then if it fails, the XGBoost model will be used.
 - If no model is selected, splitting will be done using graph-based heuristics, with no AI model involved.
+For fast splitting, heuristic and XGBoost models are fast and efficient for most cases. On the other hand, the Transformer model runs on CPU, so it is slower, especially for processing large CSV files.
+For choosing the right model to split large datasets (in the `Upload CSV` tab), we reccommend to first testing out _all_ the available models (heuristic, XGBoost, and Transformer) on a few PROTACs in the `Single SMILES Input` tab and check the quality of the splits.
 """)
         with gr.Row():
             with gr.Column(scale=2):
                     label="Number of Processes",
                     value=2,
                     minimum=1,
+                    maximum=2,
                     step=1,
+                    info="Number of processes to use for parallel processing. Higher values may improve performance but require more memory. (Capped to 2 in this HF Space)"
                 )
             # Add a number input for beam_size if Transformer model is selected
         # ----------------------------------------------------------------------
         gr.Markdown("""## Specify Inputs
+**Disclaimer**: The input SMILES is checked for validity before processing. However, there is no check on whether the SMILES is a PROTAC-like molecule or not.
 For example, attempting to split the SMILES `c1ccccc` (benzene) with the XGBoost or heuristic strategies will return an error, as ring bonds are ignored for splitting.
 On the other end, `c1ccccc1CCC1CCCC1` will return a plausible split, even though it is not a PROTAC molecule.
 """)