Spaces:

IlPakoZ
/

DLRNA-BERTa

Running

App Files Files Community

IlPakoZ commited on Dec 2, 2025

Commit

061dd46

verified ·

1 Parent(s): 4780493

Added duplicate deletion, limit to input SMILES and updated infos

Browse files

Files changed (2) hide show

README.md +3 -1
app.py +69 -39

README.md CHANGED Viewed

@@ -63,7 +63,9 @@ C1CCCCC1O
 C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2
 ```
-The model will predict binding affinity for each drug-target pair sequentially. For visualizations, the results will display only the last SMILES entry.
 ### Programmatic usage

 C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2
 ```
+The model processes each drug–target pair in sequence to predict binding affinity. For visualization, only the final SMILES entry is shown.
+If “Remove duplicate SMILES” is enabled, any repeated SMILES strings are filtered out before analysis.
+Final results follow the original SMILES order, with adjusted index numbers when duplicates have been removed.
 ### Programmatic usage

app.py CHANGED Viewed

@@ -334,19 +334,41 @@ class DrugTargetInteractionApp:
 # Initialize the app
 app = DrugTargetInteractionApp()
-def predict_wrapper(target_seq, drug_smiles):
     """Wrapper function for Gradio interface"""
     if not target_seq.strip() or not drug_smiles.strip():
         return "Please provide both target sequence and drug SMILES."
-    return app.predict_interaction(target_seq, drug_smiles.split("\n"))
-def visualize_wrapper(target_seq, drug_smiles):
     """Wrapper function for visualization"""
     if not target_seq.strip() or not drug_smiles.strip():
         return None, None, None, "Please provide both target sequence and drug SMILES."
-    return app.visualize_interaction(target_seq, drug_smiles.split("\n"))
 def load_model_wrapper(model_path):
     """Wrapper function to load model"""
@@ -389,9 +411,15 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
                         "For multiple SMILES, enter each on a new line:\n"
                         "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O\n"
                         "C1CCCCC1O",
-                    lines=3
                 )
                 with gr.Row():
                     predict_btn = gr.Button("🚀 Predict Interaction", variant="primary", size="lg")
                     visualize_btn = gr.Button("📊 Generate Visualizations", variant="secondary", size="lg")
@@ -417,7 +445,7 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
                     "C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2"
                 ]
             ],
-            inputs=[target_input, drug_input],
             outputs=prediction_output,
             fn=predict_wrapper,
             cache_examples=False
@@ -426,13 +454,14 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
         # Button click events
         predict_btn.click(
             fn=predict_wrapper,
-            inputs=[target_input, drug_input],
             outputs=prediction_output
         )
-        def visualize_and_update(target_seq, drug_smiles):
             """Generate visualizations and update both status and state"""
-            img1, img2, img3, status = visualize_wrapper(target_seq, drug_smiles)
             # Combine prediction result with visualization status
             combined_status = status + "\n\nVisualization analysis complete. Please navigate to the Visualizations tab to view the generated images."
             if len(drug_smiles) > 1:
@@ -442,11 +471,11 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
         visualize_btn.click(
             fn=visualize_and_update,
-            inputs=[target_input, drug_input],
             outputs=[viz_state1, viz_state2, viz_state3, prediction_output],
-            api_name="visualize_and_update"  # Make this API accessible
         )
     with gr.Tab("📊 Visualizations"):
         gr.HTML("""
         <div style="text-align: center; margin-bottom: 20px;">
@@ -588,33 +617,34 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
     with gr.Tab("ℹ️ About"):
         gr.Markdown("""
         ## About this application
-        This application implements DLRNA-BERTa, a Dual Language RoBERTa Transformer model for predicting drug to RNA target interactions. The model architecture includes:
-        - **Target encoder**: Processes RNA sequences using RNA-BERTa
-        - **Drug encoder**: Processes molecular SMILES notation using ChemBERTa
-        - **Cross-attention mechanism**: Captures interactions between drugs and targets
-        - **Regression head**: Predicts binding affinity scores (pKd values)
-        ### Input requirements:
-        - **Target sequence**: RNA sequence of the target (nucleotide sequences: A, U, G, C)
-        - **Drug SMILES**: Simplified Molecular Input Line Entry System notation
-          - **Batch prediction**: Enter multiple SMILES strings, one per line, to predict binding affinity for multiple drugs against the same target
-        ### Model features:
-        - Cross-attention for drug-target interaction modeling
-        - Dropout for regularization
-        - Layer normalization for stable training
-        - Interpretability mode for contribution and attention visualization
-        - Support for batch predictions with multiple SMILES entries
-        ### Usage tips:
-        1. Load a trained model using the Model Settings tab (optional)
-        2. Enter a RNA sequence and drug SMILES in the Prediction & Analysis tab
-        3. For batch predictions, enter multiple SMILES strings (one per line) in the drug SMILES field
-        4. Click "Predict Interaction" for binding affinity prediction only
-        5. Click "Generate Visualizations" to create detailed interaction analysis - results will appear in the Visualizations tab
-        6. Note: Visualizations are generated only for the last SMILES entry when using batch mode
         For best results, ensure your input sequences are properly formatted and within reasonable length limits (max 512 tokens).

 # Initialize the app
 app = DrugTargetInteractionApp()
+def smiles_preprocessing(drug_smiles):
+    drugs = drug_smiles.strip().split("\n")
+    # Remove molecule duplicates in O(n) while preserving the order
+    seen = set()
+    sorted_drugs = []
+    kept = 0
+    for x in drugs:
+        if x not in seen:
+            seen.add(x)
+            sorted_drugs.append(x)
+            kept += 1
+    logger.info(f"{kept-len(drugs)} duplicate smiles removed!")
+    return sorted_drugs
+def predict_wrapper(target_seq, drug_smiles, remove_dups):
     """Wrapper function for Gradio interface"""
     if not target_seq.strip() or not drug_smiles.strip():
         return "Please provide both target sequence and drug SMILES."
+    target_seq = target_seq.strip()
+    drug_smiles = smiles_preprocessing(drug_smiles, remove_dups)
+    return app.predict_interaction(target_seq, drug_smiles)
+def visualize_wrapper(target_seq, drug_smiles, remove_dups):
     """Wrapper function for visualization"""
     if not target_seq.strip() or not drug_smiles.strip():
         return None, None, None, "Please provide both target sequence and drug SMILES."
+    target_seq = target_seq.strip()
+    drug_smiles = smiles_preprocessing(drug_smiles, remove_dups)
+    return app.visualize_interaction(target_seq, drug_smiles)
 def load_model_wrapper(model_path):
     """Wrapper function to load model"""
                         "For multiple SMILES, enter each on a new line:\n"
                         "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O\n"
                         "C1CCCCC1O",
+                    lines=4,
+                    max_lines=2000
                 )
+                remove_dups_checkbox = gr.Checkbox(
+                    label="Remove duplicate SMILES",
+                    value=True
+                )
                 with gr.Row():
                     predict_btn = gr.Button("🚀 Predict Interaction", variant="primary", size="lg")
                     visualize_btn = gr.Button("📊 Generate Visualizations", variant="secondary", size="lg")
                     "C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2"
                 ]
             ],
+            inputs=[target_input, drug_input, remove_dups_checkbox],
             outputs=prediction_output,
             fn=predict_wrapper,
             cache_examples=False
         # Button click events
         predict_btn.click(
             fn=predict_wrapper,
+            inputs=[target_input, drug_input, remove_dups_checkbox],
             outputs=prediction_output
         )
+        def visualize_and_update(target_seq, drug_smiles, remove_dups):
             """Generate visualizations and update both status and state"""
+            img1, img2, img3, status = visualize_wrapper(target_seq, drug_smiles, remove_dups)
             # Combine prediction result with visualization status
             combined_status = status + "\n\nVisualization analysis complete. Please navigate to the Visualizations tab to view the generated images."
             if len(drug_smiles) > 1:
         visualize_btn.click(
             fn=visualize_and_update,
+            inputs=[target_input, drug_input, remove_dups_checkbox],
             outputs=[viz_state1, viz_state2, viz_state3, prediction_output],
+            api_name="visualize_and_update"
         )
     with gr.Tab("📊 Visualizations"):
         gr.HTML("""
         <div style="text-align: center; margin-bottom: 20px;">
     with gr.Tab("ℹ️ About"):
         gr.Markdown("""
         ## About this application
+        This application implements DLRNA-BERTa, a Dual Language RoBERTa Transformer model for predicting drug-to-RNA target interactions. The architecture combines:
+        - **Target encoder**: RNA-BERTa for processing RNA sequences
+        - **Drug encoder**: ChemBERTa for SMILES representation
+        - **Cross-attention mechanism**: Captures interactions between drug and target
+        - **Regression head**: Predicts binding affinity (pKd)
+        ### Input requirements
+        - **Target sequence**: RNA sequence (A, U, G, C)
+        - **Drug SMILES**: One or more SMILES strings
+        - For batch mode, enter each SMILES on a new line (max 2000 entries)
+        - A checkbox option allows automatic removal of duplicate SMILES before prediction
+        ### Model features
+        - Cross-attention for drug-target interaction modeling
+        - Regularization via dropout
+        - Layer normalization for stable training
+        - Dedicated interpretability mode for visualization
+        - Batch prediction with optional de-duplication
+        ### Usage tips
+        1. Load a model (optional) in the Model Settings tab
+        2. Enter an RNA sequence and one or more SMILES strings
+        3. Use the **“Remove duplicate SMILES”** checkbox if you want duplicates filtered automatically
+        4. Click *Predict Interaction* for affinity scores
+        5. Click *Generate Visualizations* for interpretability plots
+        6. Visualizations are produced only for the final SMILES entry in batch mode
         For best results, ensure your input sequences are properly formatted and within reasonable length limits (max 512 tokens).